THE PRIMES CONTAIN ARBITRARILY LONG ARITHMETIC 

PROGRESSIONS 

BEN GREEN AND TERENCE TAO 

Abstract. We prove that there are arbitrarily long arithmetic progressions of primes. 

There are three major ingredients. The first is Szemeredi's theorem, which asserts 
that any subset of the integers of positive density contains progressions of arbitrary 
length. The second, which is the main new ingredient of this paper, is a certain trans- 
ference principle. This allows us to deduce from Szemeredi's theorem that any subset 
of a sufficiently pseudorandom set (or measure) of positive relative density contains 
progressions of arbitrary length. The third ingredient is a recent result of Goldston 
and Yildirim, which we reproduce here. Using this, one may place (a large fraction 
of) the primes inside a pseudorandom set of "almost primes" (or more precisely, a 
pseudorandom measure concentrated on almost primes) with positive relative density. 



1. Introduction 

It is a well-known conjecture that there are arbitrarily long arithmetic progressions 
of prime numbers. The conjecture is best described as "classical", or maybe even 
"folklore". In Dickson's History it is stated that around 1770 Lagrange and Waring 
investigated how large the common difference of an arithmetic progression of L primes 
must be, and it is hard to imagine that they did not at least wonder whether their 
results were sharp for all L. 

It is not surprising that the conjecture should have been made, since a simple heuristic 
based on the prime number theorem would suggest that there are 3> N 2 /\og k N k- 
tuples of primes pi, . . . ,p^ in arithmetic progression, each pi being at most N. Hardy and 
Littlewood |24J, in their famous paper of 1923, advanced a very general conjecture which, 
as a special case, contains the hypothesis that the number of such fc-term progressions 
is asymptotically CkN 2 / \og k N for a certain explicit numerical factor Ck > (we do 
not come close to establishing this conjecture here, obtaining instead a lower bound 
(7(fc) + o(l))N 2 /log k N for some very small j(k) > 0). 

The first theoretical progress on these conjectures was made by van der Corput [12] (see 
also [8J) who, in 1939, used Vinogradov's method of prime number sums to establish the 
case k = 3, that is to say that there are infinitely many triples of primes in arithmetic 
progression. However, the question of longer arithmetic progressions seems to have 
remained completely open (except for upper bounds), even for k — 4. On the other 
hand, it has been known for some time that better results can be obtained if one 
replaces the primes with a slightly larger set of almost primes. The most impressive 
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such result is due to Heath-Brown [25J . He showed that there are infinitely many 4-term 
progressions consisting of three primes and a number which is either prime or a product 
of two primes. In a somewhat different direction, let us mention the beautiful results of 
Balog [21 |3] . Among other things he shows that for any m there are m distinct primes 
Pi, . . . ,p m such that all of the averages \{j>i + pj) are prime. 

The problem of finding long arithmetic progressions in the primes has also attracted the 
interest of computational mathematicians. At the time of writing the longest known 
arithmetic progression of primes is of length 23, and was found in 2004 by Markus Frind, 
Paul Underwood, and Paul Jobling: 

56211383760397 + 44546738095860A;; k = 0,1, ... ,22. 

An earlier arithmetic progression of primes of length 22 was found by Moran, Pritchard 
and Thyssen |32j : 

11410337850553 + 4609098694200&; k = 0, 1, . . . , 21. 

Our main theorem resolves the above conjecture. 

Theorem 1.1. The prime numbers contain infinitely many arithmetic progressions of 
length k for all k. 

In fact, we can say something a little stronger: 

Theorem 1.2 (Szemeredi's theorem in the primes). Let A be any subset of the prime 
numbers of positive relative upper density, thus limsup^^^ 7r(A^) _1 |y4 n [X, iV] | > 0, 
where ir(N) denotes the number of primes less than or equal to N. Then A contains 
infinitely many arithmetic progressions of length k for all k. 

If one replaces "primes" in the statement of Theorem ll.2l by the set of all positive integers 
Z + , then this is a famous theorem of Szemeredi [HE]- The special case k = 3 of Theorem 
11.21 was recently established by the first author [21] using methods of Fourier analysis. 
In contrast, our methods here have a more ergodic theory flavour and do not involve 
much Fourier analysis (though the argument does rely on Szemeredi's theorem which 
can be proven by either combinatorial, ergodic theory, or Fourier analysis arguments). 
We also remark that if the primes were replaced by a random subset of the integers, 
with density at least N~ l / 2+£ on each interval [l,iV], then the k = 3 case of the above 
theorem was established in [30J. 
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2. AN OUTLINE OF THE PROOF 

Let us start by stating Szemeredi's theorem properly. In the introduction we claimed 
that it was a statement about sets of integers with positive upper density, but there are 
other equivalent formulations. A "finitary" version of the theorem is as follows. 

Proposition 2.1 (Szemeredi's theorem). [371 138] Let N be a positive integer and let 
Zat := Z/iVZ^] Let 5 > be a fixed positive real number, and let k ^ 3 be an integer. 
Then there is a minimal Nq(5, k) < oo with the following property. If N ^ Nq(S, k) and 
A C 7L N is any set of cardinality at least 5N, then A contains an arithmetic progression 
of length k. 

Finding the correct dependence of N Q on 5 and k (particularly 5) is a famous open 
problem. It was a great breakthrough when Gowers [TSl [T§] showed that 

N (5,k)^2 2S ~ C \ 

where Ck is an explicit constant (Gowers obtains Ck = 2 2fe+9 ). It is possible that a 
new proof of Szemeredi's theorem could be found, with sufficiently good bounds that 
Theorem 11.11 would follow immediately. To do this one would need something just a 
little weaker than 

N (5,k) < 2 Cfc5_1 (2.1) 
(there is a trick, namely passing to a subprogression of common difference 2 x 3 x 5 x 
• ■ ■ x w(N) for appropriate w(N), which allows one to consider the primes as a set of 
density essentially log log N/ log N rather than 1 / log N\ we will use a variant of this 
'W-trick" later in this paper to eliminate local irregularities arising from small divisors). 
In our proof of Theorem 11.21 we will need to use Szemeredi's theorem, but we will not 
need any quantitative estimimates on N (5, k). 

Let us state, for contrast, the best known lower bound which is due to Rankin (35] (see 
also Lacey-Laba [3"T]): 

N {S,k) ^ exp(c7(logl/(5) 1+Llog2(fc - 1)J ). 

At the moment it is clear that a substantial new idea would be required to obtain a result 
of the strength (ETC]) . In fact, even for k = 3 the best bound is N (5, 3) ^ 2 c5 ~ 2log ( 1 / 5 ), a 
result of Bourgain [6] . The hypothetical bound (12. lj) is closely related to the following 
very open conjecture of Erdos: 

Conjecture 2.2 (Erdos conjecture on arithmetic progressions). Suppose that A = {ai < 
a 2 < ■ . . } is an infinite sequence of integers such that = oo. Then A contains 

arbitrarily long arithmetic progressions. 

This would imply Theorem 11.11 

We do not make progress on any of these issues here. In one sentence, our argument can 
be described instead as a transference principle which allows us to deduce Theorems 

1 We will retain this notation throughout the paper, thus Zjv will never refer to the 7V-adics. We 
always assume for convenience that N is prime. It is very convenient to work in Zjv, rather than the 
more traditional [-N, N], since we are free to divide by 2,3, ... ,k and it is possible to make linear 
changes of variables without worrying about the ranges of summation. There is a slight price to pay 
for this, in that one must now address some "wraparound" issues when identifying TLjsi with a subset 
of the integers, but these will be easily dealt with. 
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11.11 and 11.21 from Szemeredi's theorem, regardless of what bound we know for N (5, k); 
in fact we prove a more general statement in Theorem 13.51 below. Thus, in this paper, 
we must assume Szemeredi's theorem. However with this one (rather large!) caveato 
our paper is self-contained. 

Szemeredi's theorem can now be proved in several ways. The original proof of Szemeredi 
[371 EE] was combinatorial. In 1977, Furstenberg made a very important breakthrough 
by providing an ergodic-theoretic proof [10J. Perhaps surprisingly for a result about 
primes, our paper has at least as much in common with the ergodic-theoretic approach 
as it does with the harmonic analysis approach of Gowers. We will use a language 
which suggests this close connection, without actually relying explicitly on any ergodic- 
theoretical concept^. In particular we shall always remain in the finitary setting of Zjy, 
in contrast to the standard ergodic theory framework in which one takes weak limits 
(invoking the axiom of choice) to pass to an infinite measure-preserving system. As will 
become clear in our argument, in the finitary setting one can still access many tools 
and concepts from ergodic theory, but often one must incur error terms of the form o(l) 
when one does so. 

Here is another form of Szemeredi's theorem which suggests the ergodic theory analogy 
more closely. We use the conditional expectation notation K(f\xi G B) to denote the 
average of / as certain variables Xi range over the set B, and o(l) for a quantity which 
tends to zero as iV — > oo (we will give more precise definitions later). 

Proposition 2.3 (Szemeredi's theorem, again). Write u cons t : %n —> JR + for the con- 
stant function z/ cons t = 1. Let < 5 ^ 1 and k ^ 1 be fixed. Let N be a large integer 
parameter, and let f : Zn — > M + be a non-negative function obeying the bounds 

< f(x) <: z/const (a;) for all xeZ N (2.2) 

and 

E(f(x)\x E Z N ) > 5. (2.3) 

Then we have 

E{f{x)f{x + r) ... f(x + (k- l)r)\x, r e Z N ) > c{k, S) - o M (l) 
for some constant c(k, 5) > which does not depend on f or N. 

Remark. Ignoring for a moment the curious notation for the constant function i^onst) 
there are two main differences between this and Proposition 12.11 One is the fact that 
we are dealing with functions rather than sets: however, it is easy to pass from sets to 
functions, for instance by probabilistic arguments. Another difference, if one unravels 



2 We will also require some standard facts from analytic number theory such as the prime number 
theorem, Dirichlet's theorem on primes in arithmetic progressions, and the classical zero-free region for 
the Riemann (^-function (see Lemma lA. 1|) . 

3 It has become clear that there is a deep connection between harmonic analysis (as applied to solving 
linear equations in sets of integers) and certain parts of ergodic theory. Particularly exciting is the 
suspicion that the notion of a fc-step nilsystem, explored in many ergodic-theoretical works (see e.g. 
[271158, 29, 44J), might be analogous to a kind of "higher order Fourier analysis" which could be used to 
deal with systems of linear equations that cannot be handled by conventional Fourier analysis (a simple 
example being the equations x\ + X3 = 2x2, %2 + £4 = 2x3, which define an arithmetic progression of 
length 4). We will not discuss such speculations any further here, but suffice it to say that much is left 
to be understood. 
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the E notation, is that we are now asserting the existence of 3> iV 2 arithmetic progres- 
sions, and not just one. Once again, such a statement can be deduced from Proposition 
12.11 with some combinatorial trickery (of a less trivial nature this time - the argument 
was first worked out by Varnavides |43j). A direct proof of Proposition 12.31 can be 
found in [30] • A formulation of Szemeredi's theorem similar to this one was also used 
by Furstenberg [10J. Combining this argument with the one in Gowers gives an explicit 
bound on c(k, 5) of the form c(k, S) ^ exp(— exp(5 _Cfc )) for some Ck > 0. 

Now let us abandon the notion that v is the constant function. We say that v : Zjy — > K + 



We are going to exhibit a class of measures, more general than the constant function 
fconst; for which Proposition 12.31 still holds. These measures, which we will call pseudo- 
random, will be ones satisfying two conditions called the linear forms condition and the 
correlation condition. These are, of course, defined formally below, but let us remark 
that they are very closely related to the ergodic-theory notion of weak-mixing. It is per- 
fectly possible for a "singular" measure - for instance, a measure for which E(y 2 ) grows 
like a power of log N - to be pseudorandom. Singular measures are the ones that will 
be of interest to us, since they generally support rather sparse sets. This generalisation 
of Proposition 12.31 is Proposition 13.51 below. 

Once Proposition 13.51 is proved, we turn to the issue of finding primes in AP. A possible 
choice for v would be A, the von Mangoldt function (this is defined to equal logp at 
p m , m = 1, 2, . . . , and otherwise). Unfortunately, verifying the linear forms condition 
and the correlation condition for the von Mangoldt function (or minor variants thereof) 
is strictly harder than proving that the primes contain long arithmetic progressions; 
indeed, this task is comparable in difficulty to the notorious Hardy-Littlewood prime 
tuples conjecture, for which our methods here yield no progress. 

However, all we need is a measure v which (after rescaling by at most a constant factor) 
majorises A pointwise. Then, (12.31) will be satisfied with / = A. Such a measure is 
provided to ua^ by recent work of Goldston and Yildinm [17] concerning the size of gaps 
between primes. The proof that the linear forms condition and the correlation condition 
are satisfied is heavily based on their work, so much so that parts of the argument are 
placed in an appendix. 



The term normalized probability density might be more accurate here, but measure has the advan- 
tage of brevity. One may think of t'const as the uniform probability distribution on Zjv, and v as some 
other probability distribution which can concentrate on a subset of Zjv of very small density (e.g. it 
may concentrate on the "almost primes" in [l,iV]). 

5 Actually, there is an extra technicality which is caused by the very irregular distribution of primes 
in arithmetic progressions to small moduli (there are no primes congruent to 4(mod6), for example). 
We get around this using something which we refer to as the VF-trick, which basically consists of 
restricting the primes to the arithmetic progression n = l(mod W), where W = Y[ P<W (N)P anc ^ w(N) 
tends slowly to infinity with N. Although this looks like a trick, it is actually an extremely important 
feature of that part of our argument which concerns primes. 





(2.4) 
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The idea of using a majorant to study the primes is by no means new - indeed in some 
sense sieve theory is precisely the study of such objects. For another use of a majorant 
in an additive-combinatorial setting, see [53} [HI). 

It is now timely to make a few remarks concerning the proof of Proposition 13.51 It is in 
the first step of the proof that our original investigations began, when we made a close 
examination of Gowers' arguments. If / : Z^v — > K + is a function then the normalised 
count of fc-term arithmetic progressions 

E(f(x)f(x + r)...f(x + (k- l)r)\x,r G 1 N ) (2.5) 

is closely controlled by certain norms || • \\jjd, which we would like to call the Gowers 
uniformity norm^. They are defined in §13 The formal statement of this fact can be 
called a generalised von Neumann theorem. Such a theorem, in the case v = z/ cons t, 
was proved by Gowers [T3] as a first step in his proof of Szemeredi's theorem, using 
k — 2 applications of the Cauchy-Schwarz inequality. In Proposition 15.31 we will prove 
a generalised von Neumann theorem relative to an arbitrary pseudorandom measure v. 
Our main tool is again the Cauchy-Schwarz inequality. We will use the term Gowers 
uniform loosely to describe a function which is small in some U d norm. This should not 
be confused with the term pseudorandom, which will be reserved for measures on Zat. 

Sections EHH1 are devoted to concluding the proof of Proposition 13.51 Very roughly 
the strategy will be to decompose the function / under consideration into a Gowers 
uniform component plus a bounded "Gowers anti-uniform" object (plus a negligible 
error). The notion^ of Gowers anti-uniformity is captured using the dual norms (U d )*, 
whose properties are laid out in £j6l 

The contribution of the Gowers-uniform part to the count ( 12. 5ft will be negligible by 
the generalised von Neumann theorem. The contribution from the Gowers anti-uniform 
component will be bounded from below by Szemeredi's theorem in its traditional form, 
Proposition 12.31 

3. Pseudorandom measures 

In this section we specify exactly what we mean by a pseudorandom measure on Z/v- 
First, however, we set up some notation. We fix the length k of the arithmetic pro- 
gressions we are seeking. N = |Zjv| will always be assumed to be prime and large (in 
particular, we can invert any of the numbers 1, . . . , k in Zjv), and we will write o(l) for 

6 Analogous objects have recently surfaced in the genuinely ergodic-theoretical work of Host and 
Kra PTJ HEi concerning non-conventional ergodic averages, thus enhancing the connection between 
ergodic theory and additive number theory. 

7 We note that Gowers uniformity which is a measure of "randomness", "uniform distribution", 
or "unbiasedness" in a function should not be confused with the very different notion of uniform 
boundedness. Indeed, in our arguments, the Gowers uniform functions will be highly unbounded, 
whereas the Gowers anti-uniform functions will be uniformly bounded. Anti-uniformity can in fact be 
viewed as a measure of "smoothness" , "predictability" , "structure" , or "almost periodicity" . 

8 Using the language of ergodic theory we are essentially claiming that the Gowers anti-uniform 
functions form a characteristic factor for the expression (|2.5|) . The point is that even though / is not 
necessarily bounded uniformly, the fact that it is bounded pointwise by a pseudorandom measure v 
allows us to conclude that the 'projection of / to the Gowers anti-uniform component is bounded, at 
which point we can invoke the standard Szemeredi theorem. 
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a quantity that tends to zero as N — > oo. We will write 0(1) for a bounded quantity. 
Sometimes quantities of this type will tend to zero (resp. be bounded) in a way that 
depends on some other, typically fixed, parameters. If there is any danger of confusion 
as to what is being proved, we will indicate such dependence using subscripts, thus for 
instance Oj >e (l) denotes a quantity whose magnitude is bounded by C(j,e) for some 
quantity C(j, e) > depending only j, e. Since every quantity in this paper will depend 
on k, however, we will not bother indicating the k dependence throughout this paper. 
As is customary we often abbreviate 0(1) X and o(l)X as 0(X) and o(X) respectively 
for various non- negative quantities X. 

If A is a finite non-empty set (for us A is usually just Zjv) and / : A —* R is a function, 
we write E(/) := K(f(x)\x G A) for the average value of /, that is to say 



Here, as is usual, we write \A\ for the cardinality of the set A. More generally, if P(x) 
is any statement concerning an element of A which is true for at least one x G A, we 
define _^ 

:= \{xeA:P(x)}\ - 

This notation extends to functions of several variables in the obvious manner. We 
now define two notions of randomness for a measure, which we term the linear forms 
condition and the correlation condition. 

Definition 3.1 (Linear forms condition). Let v : Zjv — > K + be a measure. Let mo, to 
and L be small positive integer parameters. Then we say that v satisfies the (m , t , L )- 
linear forms condition if the following holds. Let m ^ mo and t ^ to be arbitrary, 
and suppose that (Lij)i^i^ m A^j^t are arbitrary rational numbers with numerator and 
denominator at most Lq in absolute value, and that bi, 1 ^ i ^ to, are arbitrary elements 
of Ztv- For 1 ^ i ^ to, let ipi : I} N — ► Z^v be the linear forms ^(x) = Y^!j=i ^ij x j + 
where x = (x%, . . . , Xt) G Zjy, and where the rational numbers Ly are interpreted as 
elements of Z^r in the usual manner (assuming iV is prime and larger than L ). Suppose 
that as i ranges over 1, . . . , m, the t-tuples £ Q* are non-zero, and no t-tuple 

is a rational multiple of any other. Then we have 

E (KV>l(x)) . . . i/ftMx)) | x G 1} N ) = 1 + o Lo , mo , to (l). (3.1) 

Note that the rate of decay in the oil) term is assumed to be uniform in the choice of 
bi, ■ ■ ■ , b m . 

Remarks. It is the parameter m , which controls the number of linear forms, that is 
by far the most important, and will be kept relatively small. It will eventually be set 
equal to k ■ 2 k ~ 1 . Note that the m = 1 case of the linear forms condition recovers the 
measure condition ( 12 Ah . Other simple examples of the linear forms condition which we 
will encounter later are 

E(u(x)u(x + h x )v(x + h 2 )u(x + h x + h 2 ) | x, h u h 2 G Z N ) = 1 + o(l) (3.2) 

(here (too, to, A)) = (4,3, 1)); 

E(v(x + hi)v(x + h 2 )v(x + hi + h 2 ) \ hi, h 2 G Z N ) = 1 + o(l) (3.3) 
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for all x G Z N (here (m , to, L ) = (3, 2, 1)) and 

r(v{(x - y)/2)u({x -y+ h 2 )/2)u(-y)u(-y - hi) X 

x u((x - y')/2)u((x -y'+ h 2 )/2)u(-y')u(-y' - hi) x 



x v(x)v{x + h\)v(x + h 2 )v[x + hi + h 2 ) x, hi, h 2 , y, y e Z N 



) 



l + o(l) 



(3.4) 



(here (mo, to, Lq) = (12,5,2)). For those readers familiar with the Gowers uniformity 
norms U k ~ l (which we shall discuss in detail later), the example (13.21) demonstrates that 
v is close to 1 in the U 2 norm (see Lemma f 5 . 2 f) . Similarly, the linear forms condition 
with appropriately many parameters implies that v is close to 1 in the U d norm, for any 
fixed d ^ 2. However, the linear forms condition is much stronger than simply asserting 
that \\v — l\\u<i is small for various d. 

For the application to the primes, the measure v will be constructed using truncated 
divisor sums, and the linear forms condition will be deduced from some arguments of 
Goldston and Yildinm. From a probabilistic point of view, the linear forms condition is 
asserting a type of joint independence between the "random variables" z/(^j(x)); in the 
application to the primes, v will be concentrated on the "almost primes" , and the linear 
forms condition is then saying that the events u ipj(x) is almost prime" are essentially 
independent of each other as j varies^. 

Definition 3.2 (Correlation condition). Let v : Z^r — > M + be a measure, and let mo be 
a positive integer parameter. We say that v satisfies the mo-correlation condition if for 
every 1 < m ^ mo there exists a weight function r = r m : — ► IR + which obeys the 
moment conditions 



for all hi, . . . , h m E 1>n (not necessarily distinct). 

Remarks. The condition (13.61) may look a little strange, since if v were to be chosen 
randomly then we would expect such a condition to hold with 1 + o(l) on the right- 
hand side, at least when hi, . . . ,h m are distinct. Note that one cannot use the linear 
forms condition to control the left-hand side of f)3.6p because the linear components 
of the forms x + hj are all the same. The correlation condition has been designed 
with the primes in mind3, because in that case we must tolerate slight "arithmetic" 

9 This will only be true after first eliminating some local correlations in the almost primes arising 
from small divisors. This will be achieved by a simple "W^-trick" which we will come to later in this 
paper. 

10 A simpler, but perhaps less interesting, model case occurs when one is trying to prove Szemeredi's 
theorem relative to a random subset of {1, . . . , N} of density 1/logjV (cf. [30 ). The pseudorandom 
weight v would then be a Bernoulli random variable, with each v(x) equal to log N with independent 
probability 1/logiV and equal to otherwise. In such a case, we can (with high probability) bound 
the left-hand side of (|3.6p more cleanly by 0(1) (and even obtain the asymptotic 1 + o(l)) when the 
hj are distinct, and by (9(log m N) otherwise. 




~E(v(x + hi)v(x + h 2 ) . . . v(x + h m ) \ x E Z^r) =Sj r(hi — hj) (3.6) 
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nonuniformities. Observe, for example, that the number of p ^ iV for which p — h is 
also prime is not bounded above by a constant times Nf log 2 N if h contains a very 
large number of prime factors, although such exceptions will of course be very rare and 
one still expects to have moment conditions such as ( 13. 5ft . It is phenomena like this 
which prevent us from assuming an L°° bound for r. While tuq will be restricted to be 
small (in fact, equal to 2 fe ~ 1 ), it will be important for us that there is no upper bound 
required on q (which we will eventually need to be a very large function of k, but still 
independent of N of course). Since the correlation condition is an upper bound rather 
than an asymptotic, it is fairly easy to obtain; we shall prove it using the arguments of 
Goldston and Yildirim (since we are using those methods in any case to prove the linear 
forms condition), but these upper bounds could also be obtained by more standard sieve 
theory methods. 

Definition 3.3 (Pseudorandom measures). Let v : Z^r — > K + be a measure. We say 
that v is fc-pseudorandom if it satisfies the (k ■ 2 k ~ 1 ,3k — 4, k ) -linear forms condition 
and also the 2 fe_ ^correlation condition. 

Remarks. The exact values k-2 , 3k — 4, k, 2 k ~ 1 of the parameters chosen here are not 
too important; in our application to the primes, any quantities which depend only on k 
would suffice. It can be shown that if C = Cf. > 1 is any constant independent of N and 
if S C Z, N is chosen at random, each x G Z N being selected to lie in S independently 
at random with probability l/log^iV, then (with high probability) the measure v = 
\og c Nls is fc-pseudorandom, and the Hardy-Littlewood prime tuples conjecture can 
be viewed as an assertion that the Von Mangoldt function is essentially of this form 
(once one eliminates the obvious obstructions to pseudorandomness coming from small 
prime divisors). While we will of course not attempt to establish this conjecture here, 
in $9] we will construct pseudorandom measures which are concentrated on the almost 
primes instead of the primes; this is of course consistent with the so-called "fundamental 
lemma of sieve theory" , but we will need a rather precise variant of this lemma due to 
Goldston and Yildirim. 

The function i^onst = 1 is clearly fc-pseudorandom for any k. In fact the pseudorandom 
measures are star-shaped around the constant measure: 

Lemma 3.4. Let v be a k-pseudorandom measure. Then V\ii := [y + z^const)/^ = 
[y + l)/2 is also a k-pseudorandom measure (though possibly with slightly different 
bounds in the OQ and oQ terms). 

Proof. It is clear that V\i% is non-negative and has expectation 1 + o(l). To verify the 
linear forms condition (13.11) . we simply replace v by (y + l)/2 in the definition and 
expand as a sum of 2 m terms, divided by 2 m . Since each term can be verified to be 
1 +o(l) by the linear forms condition (13. ip . the claim follows. The correlation condition 
is verified in a similar manner. (A similar result holds for (1 — ff)v + ^z/const for any 
^ 6 ^ 1, but we will not need to use this generalization.) □ 

The following result is one of the main theorems of the paper. It asserts that for 
the purposes of Szemeredi's theorem (and ignoring o(l) errors), there is no distinction 
between a fc-pseudorandom measure v and the constant measure z/ cons t. 

Theorem 3.5 (Szemeredi's theorem relative to a pseudorandom measure). Let k ^ 3 

and < 5 ^ 1 be fixed parameters. Suppose that v : Z^r — * IR + is k-pseudorandom. Let 
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/ : Ztv — > 1R + fre any non-negative function obeying the bound 

< f(x) ^ /or all x e Z N (3.7) 

and 

E(/) > 5. (3.8) 

Tnen we have 

E{f{x)f{x + r) ... f(x + (k — l)r)\x, r e Z N ) ^ c{k, S) - o M (l) (3.9) 

where c(k, 5) > is the same constant which appears in Proposition \2.3[ ( The decay 
rate Ok,s(l), on the other hand, decays significantly slower than that in Proposition \2.3l 
and depends of course on the decay rates in the linear forms and correlation conditions) . 

We remark that while we do not explicitly assume that N is large in Theorem 13.51 we 
are free to do so since the conclusion (13.91) is trivial when N = 0^(1). We certainly en- 
courage the reader to think of N has being extremely large compared to other quantities 
such as k and 5, and to think of o(l) errors as being negligible. 

The proof of this theorem will occupy the next few sections, £@HE1 Interestingly, the 
proof requires no Fourier analysis, additive combinatorics, or number theory; the argu- 
ment is instead a blend of quantitative ergodic theory arguments with some combinato- 
rial estimates related to Gowers uniformity and sparse hypergraph regularity. From £0 
onwards we will apply this theorem to the specific case of the primes, by establishing a 
pseudorandom majorant for (a modified version of) the von Mangoldt function. 

4. Notation 

We now begin the proof of Theorem 13.51 Thoughout this proof we fix the parameter 
k ^ 3 and the probability density v appearing in Theorem 13.51 All our constants in the 
00 and o() notation are allowed to depend on k (with all future dependence on this 
parameter being suppressed), and are also allowed to depend on the bounds implicit in 
the right-hand sides of (13.11) and (13.51) . We may take N to be sufficiently large with 
respect to k and 5 since (I3.9P is trivial otherwise. 

We need some standard L q spaces. 

Definition 4.1. For every 1 ^ q ^ oo and / : Z^r — > R, we define the L q norms as 

wn\ Lq :=n\f\ q ) 1,q 

with the usual convention that ||/||l°° : = suPzeZjv We let L 9 (Zat) be the Banach 

space of all functions from Zat to K equipped with the L q norm; of course since Z^v is 
finite these spaces are all equal to each other as vector spaces, but the norms are only 
equivalent up to powers of N. We also observe that L 2 (Zjv) is a real Hilbert space with 
the usual inner product 

(f,g) :=E(fg). 

If Q is a subset of Z^v, we use 1q : Z^v — > R to denote the indicator function of Q, thus 
Iq^) = 1 if x G Q and ln(x) = otherwise. Similarly if P(x) is a statement concerning 
an element x £ Z^r, we write lpi x ) for l{ X £Z N :P(x)}(x)- 
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In our arguments we shall frequently be performing linear changes of variables and then 
taking expectations. To facilitate this we adopt the following definition. Suppose that 
A and B are finite non-empty sets and that $ : A —>■ B is a map. Then we say that $ 
is a uniform cover of B by A if $ is surjective and all the fibers {<3> -1 (&) : b G B} have 
the same cardinality (i.e. they have cardinality |A|/|JB|). Observe that if $ is a uniform 
cover of B by A, then for any function / : B — > M we have 

K(fma))\aeA)=K(f(b)\beB). (4.1) 

5. GOWERS UNIFORMITY NORMS, AND A GENERALIZED VON NEUMANN THEOREM 

As mentioned in earlier sections, the proof of Theorem 13.51 relies on splitting the given 
function / into a Gowers uniform component and a Gowers anti- uniform component. 
We will come to this splitting in later sections, but for this section we focus on defining 
the notion of Gowers uniformity, introduced in [TBI US] - The main result of this section 
will be a generalized von Neumann theorem (Proposition I5.3p . which basically asserts 
that Gowers uniform functions are negligible for the purposes of computing sums such 
as (l3~9l) . 

Definition 5.1. Let d ^ be a dimensionEE We let {0, l} d be the standard discrete 
d-dimensional cube, consisting of <i-tuples u = (u>i, . . . ,Ud) where ujj G {0, 1} for j = 
l,...,d. If h = (h, . . . , h d ) G Z% we define u ■ h := io x hx + . . . + uj d h d - If (/a,)a,e{o,i} d 
is a {0, 1} -tuple of functions in L°°(Ztv), we define the d- dimensional Gowers inner 
product ((fuj)uje{o,i} d )u d by the formula 



((^)^6{0,l} d ){/ d := E ( II fu>{x+U-h) 

^e{o,i} d 



xeZ N ,heZ d N ). (5.1) 



Henceforth we shall refer to a configuration {x + u ■ h : uo G {0, l} d } as a cube of 
dimension d. 

Example. When d = 2, we have 

(Zoo, fio, foi, fn)u* = E(/oo(x)/i (a; + hi)f m (x + h 2 )f n (x + hi + h 2 ) | x, hi, h 2 G Z N ). 

We recall from [19] the positivity properties of the Gowers inner product (15.11) when 
d ^ 1 (the d = case being trivial). First suppose that f w does not depend on the final 
digit Ud of uj, thus f u = f^,...,^^- Then we may rewrite (15.11) as 

((^u { o,i} d )c/ d =®( n + j • h '^"'( x + h d+^'- *>') 

^w'e{o,i} d - 1 

x G Z N ,ti G Z d -\h d G Z N ^j, 

where we write u' := (ui, . . . , Ua-i) and h' := (hi, . . . , hd-i)- This can be rewritten 
further as 



((/^ 6{ o,i} d >i/ d = EMe( n U'(y + J ■ h')\y G Z N ) 
^ w'e{o,i} d - 1 



ti ez^ 1 , (5.2) 



11 In practice, we will have d = k — 1, where k is the length of the arithmetic progressions under 
consideration. 
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so in particular we have the positivity property ((fu>)u,e{o,i} d )u d ^ when j w is inde- 
pendent of uJd- This proves the positivity property 

(if) u e{o,i}*)u* > (5.3) 

when d ^ 1. We can thus define the Gowers uniformity norm \\f\\u d of a function 
/ : Z^v — ► K by the formula 

d ( \ 1/2d 

I en := ((/)ue{o,i}<*)t/* =E ( II f( x + ^-h) 

\ . fry 11 



xeZjv^eZ^l . (5.4) 



X 



G Z^ 1 



■u;e{o,i} d 

When / w does depend on w d , (15.21) must be rewritten as 

((/ w ) W 6{o,i}">^ = E ( E ( II + u'-ti)\ye Z N ) 

xE( Y[ U',i(y + uj' -h')\y eZ N ) 

w'e{o,i} d - 1 

From the Cauchy-Schwarz inequality in the h! variables, we thus see that 

K(/«)«e{o,i}« , )i/« , l < ((A/ ,0)w6{0,l} d )[/d ({fuj',l)uje{0,l} d )ljd ■ 

Similarly if we replace the role of the Ud digit by any of the other digits. Applying this 
Cauchy-Schwarz inequality once in each digit, we obtain the Gowers Cauchy-Schwarz 
inequality 

KCL) 

we{o,i} d 

From the multilinearity of the inner product, and the binomial formula, we then obtain 
the inequality 

K(/ + 9)u&{o,i} d )u d \ ^ + IMIc/ d ) 

whence we obtain the Gowers triangle inequality 

\\f + 9\\u* ^ \\f\\u d + \\g\\u d - 
(cf. [Tj5] Lemmas 3.8 and 3.9). 

Example. Continuing the d — 2 example, we have 

\\f\\u2 := E(f(x)f(x + h)f(x + h 2 )f(x + h + h 2 ) | x, h u h 2 G Zjy) 1 / 4 

and the Gowers Cauchy-Schwarz inequality then states 

l E (/oo(^)/io(^ + hi)f 01 {x + h 2 )f n (x + hi + h 2 )\x, h 1 , h 2 G Z N )\ 

^ ||/oo||c/ 2 ||/io||c/ 2 ||/oi||c/ 2 ||/ii||c/ 2 - 

Applying this with f w , f 01 , f u set equal to Kronecker delta functions, one can easily 
verify that 

foo = whenever \\f 00 \\ u2 =Q. 

This, combined with the preceding discussion, shows that the U 2 norm is indeed a 
genuine norm. This can also be seen by the easily verified identity 

^ = (£l/(0l 4 ) 1/4 
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(cf. [191, Lemma 2.2), where the Fourier transform / : Z^ — > C of / is defined by the 
formulao 

f(0--=W(x)e- 2mx(/N \x£Z N ). 

for any £ G Z N . 

We return to the study of general U d norms. Since 

1 1 ^const 1 1 c/ d = II 1 II = 1, (5.6) 

we see from (15. 5ft that 

\({fu)ve{o,i}*)u*\ ^ \\f\\u d 
where f u :=l when uia — 1 and f u '-—f when = 0. But the left-hand side can easily 
be computed to be ||/||^- d _i, and thus we have the monotonicity relation 

ll/ll^ < \\f\\u< (5-7) 
for all d ^ 2. Since the t/ 2 norm was already shown to be strictly positive, we see that 
the higher norms U d , d ^ 2 are also. Thus the U d norms are genuinely norms for all 
d ^ 2. On the other hand, the U l norm is not actually a norm, since one can compute 
from (15.41) that \\fWu 1 = and thus \\fWu 1 m &y vanish without / itself vanishing. 

From the linear forms condition one can easily verify that ||^||t/ d = 1 + o(l) (cf. (13. 2p ). 
In fact more is true, namely that pseudorandom measures v are close to the constant 
measure i^onst m the U d norms; this is of course consistent with our philosophy of 
deducing Theorem 13.51 from Theorem 12.31 

Lemma 5.2. Suppose that v is k-pseudorandom (as defined in Definition \3.3\\ . Then 
we have 

\W ~ ^const||[/ d = \W ~ l \\u<* = O(l) (5.8) 

for alll^d^k- 1. 

Proof. By (15.71) it suffices to prove the claim for d = k — 1. Raising to the power 2 fc_1 , 
it suffices from 05.41) to show that 



E 



( II + u ■ h) - 1) x G Z N ,h G Z^" 1 J 



JS{0,1} 

The left-hand side can be expanded as 



oil) 



(-1) |A| e[ \[u(x + uj-h) 

fn 11 fc— 1 \,..cj 



AC{0,l} k - 1 y u>eA 

Let us look at the expression 



x G Zjv, h G Z k N L . (5.9) 



e( Y[v(x + uj-h) 

\ ,..c A 



I k — 1 



x eZ Nl heZ k N L ) (5.10) 



for some fixed A C {0, 1} . This is of the form 

E(^Vi(x))...KV'|A|(x)) \xeZ k N ) 



12 The Fourier transform of course plays a hugely important role in the k = 3 theory, and provides 
some very useful intuition to then think about the higher k theory, but will not be used in this paper 
except as motivation. 
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where x := (x,hi, . . . , h^-i) and the . . . , ijj\M are some ordering of the \A\ linear forms 
x + to ■ h, u G A. It is clear that none of these forms is a rational multiple of any other. 
Thus we may invoke the (2 k ~ l , k, l)-linear forms condition, which is a consequence of 
the fact that v is /c-pseudorandom, to conclude that the expression ( I5.10P is 1 + o(l). 

Referring back to (I5.9p . one sees that the claim now follows from the binomial theorem 

It is now time to state and prove our "generalised von Neumann theorem", which ex- 
plains how the expression (13.91) . which counts fc-term arithmetic progressions, is governed 
by the Gowers uniformity norms. All of this, of course, is relative to a pseudorandom 
measure v. 

Proposition 5.3 (Generalised von Neumann). Suppose that v is k-pseudorandom. Let 
f Q , . . . , fk-i G L 1 (Zat) be functions which are pointwise bounded by z/ + z/ con st; or in other 
words 

\fj{x)\ ^ v(x) + 1 for all x eZ N ,0 ^ j ^ k - 1. (5.11) 

Let Co, ... , Cfc_i be a permutation of some k consecutive elements of{— k+1, . . . , —1, 0, 1, . . 
1} (in practice we will take Cj := j). Then 

^(WfAx + cf) x,reZ N )=0( inf ll/J^-i) + o(l). 

Remark. This proposition is standard when v = t'const (see for instance pj2, Theorem 
3.2] or, for an analogous result in the ergodic setting, [T3J Theorem 3.1]). The novelty 
is thus the extension to the pseudorandom v studied in Theorem 13.51 The reason 
we have an upper bound of v[x) + 1 instead of v(x) is because we shall be applying 
this lemma to functions fj which roughly have the form fj — f — E(/|i3), where / 
is some function bounded pointwise by z/, and B is a c-algebra such that E(i/|Z3) is 
essentially bounded (up to o(l) errors) by 1, so that we can essentially bound \fj\ by 
p(x) + 1; see Definition 17.11 for the notations we are using here. The techniques here are 
inspired by similar Cauchy-Schwarz arguments relative to pseudorandom hypergraphs 
in [20]. Indeed, the estimate here can be viewed as a kind of "sparse counting lemma" 
that utilises a regularity hypothesis (in the guise of U k ~ 1 control on one of the fj) to 
obtain control on an expression which can be viewed as a weighted count of arithmetic 
progressions concentrated in a sparse set (the support of v) . See [2D1 EO] for some further 
examples of such lemmas. 

Proof. By replacing v with [y + l)/2 (and by dividing fj by 2), and using Lemma [3.41 
we see that we may in fact assume without loss of generality that we can improve (15.111) 
to 

\fj(x)\ v(x) for all x G Z N ,0 ^ j ^ k - 1. (5.12) 
For similar reasons we may assume that v is strictly positive everywhere. 

By permuting the fj and Cj if necessary, we may assume that the infimum 

inf ll/Jr/fc-i 
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is attained when j = 0. By shifting x by Cqv if necessary we may assume that Cq = 0. 
Our task is thus to show 

®(Hfj(x + Cjr) x,reZ N \ = O(||/ || a *-i) + o(l). (5.13) 
V j=0 J 

The proof of this will fall into two parts. First of all we will use the Cauchy-Schwarz 
inequality k — 1 times (as is standard in the proof of theorems of this general type). 
In this way we will bound the left hand side of (I5.13P by a weighted sum of f over 
(k — l)-dimensional cubes. After that, we will show using the linear forms condition 
that these weights are roughly 1 on average, which will enable us to deduce (15.131) . 

Before we give the full proof, let us first give the argument in the case k = 3, with Cj :— j. 
This is conceptually no easier than the general case, but the notation is substantially 
less fearsome. Our task is to show that 

E(/o(aO/i(z + r)f 2 {x + 2r) | x, r G Z N ) = O(\\f \\ m ) + o(l). 

It shall be convenient to reparameterise the progression (x, x + r, x + 2r) as (yi + 
2/2, 2/2/2, —yi). The fact that the first term does not depend on y\ and the second term 
does not depend on y 2 will allow us to perform Cauchy-Schwarz in the arguments below 
without further changes of variable. Since N is a large prime, we are now faced with 
estimating the quantity 

Jo := E(Myi + V2)fi(V2/2)f2(-yi) I 2/1,2/2 G 1 N ). (5.14) 

We estimate f2 in absolute value by u and bound this by 

|J | ^ E(|E(/o(j/i + y 2 )/i (2/2/2) I y 2 e Z N )\v(- yi ) \ y x E Z N ). 

Using Cauchy-Schwarz and (12. 4p . we can bound this by 

(1 + o(l))E(|E(/ (2/i + 2/ 2 )/i (2/2/2) I y 2 G Z N )\ 2 u(- Vl ) \ y x e Z N f 2 

-1 /o 

which we rewrite as (1 + o(l)) J x , where 

Ji := E(/ (2/i +2/2)/o(2/i + 2/ 2 )/i (2/2/2)/i (2/ 2 /2)^(-2/i) | 2/i, 2/2, 2/2 e Zjy). 
We now estimate /1 in absolute value by z/, and thus bound 

J x ^ E(|E(/ ( 2/l + y 2 )/ (2/i + 2/2)K-2/i)l2/i e Z^) 1^(^/2)^(^/2) | y 2 ,y 2 G Z N ). 
Using Cauchy-Schwarz and (12.4ft again, we bound this by 1 + o(l) times 

E(|E(/ (2/i + 2/ 2 )/o(2/i + 2/ 2 )K-2/i)l2/i e Zjv)| V(y 2 /2)z/(j/ 2 /2) | y 2 ,y 2 G Zjv) 1/2 . 
Putting all this together, we conclude the inequality 

|JoK(l + o(l))J 2 1/4 , (5.15) 

where 

J 2 := E(/o(2/i + 2/2)/o(2/i + V^My'i + 1h)Mi + 2/ 2 ) z/ (-2/i)^(-2/i) z/ (2/2/2)^(2/ 2 /2) 

I 2/1, 2/i, 2/2, 2/2 e Zjv). 
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If it were not for the weights involving v, J 2 would be the U 2 norm of /o, and we 
would be done. If we reparameterise the cube (yi + y 2 ,y[ + 2/2,2/1 + y 2 ,y'i + y'2) by 
(x, x + hi, x + h 2 , x + hi + h 2 ), the above expression becomes 

J 2 = E(/o(x)/ (x + hi)f (x + h 2 )f (x + hi + h 2 )W(x, hi, h 2 ) \ x, hi, h 2 e Z N ) 

where W(x, hi, h 2 ) is the quantity 

W{x, h, h 2 ) := E(y(-y)v(-y - hi)u{{x - y)/2)v{{x -y- h 2 )/2) \ y e Z N ). (5.16) 

In order to compare J 2 to ||/o||^2, we must compare W to 1. To that end it suffices to 
show that the error 

E(/o(x)/o(x + hi)f (x + h 2 )f (x + h t + h 2 )(W(x, hi, h 2 ) -l)\x, hi, h 2 e Z N ) 

is suitably small (in fact it will be o(l)). To achieve this we estimate /q in absolute 
value by v and use Cauchy-Schwarz one last time to reduce to showing that 

E(u{x)u(x + hi)v(x + h 2 )v{x + h x + /i2)(W(z, h 2 ) - l) n \ x, h x , h 2 e Z N ) = n + o(l) 

for n = 0, 2. Expanding out the W — 1 term, it suffices to show that 

E(z/(x)z/(x + /ix)z/(x + h 2 )v{x + hi + h 2 )W(x, hi, h 2 ) q \ x, h u h 2 G Z N ) = 1 + o(l) 

for g = 0, 1,2. But this follows from the linear forms condition (for instance, the case 
q = 2 is just (GOD). 

We turn now to the proof of f 1 5 . X 3 j) in general. As one might expect in view of the above 
discussion, this shall consist of a large number of applications of Cauchy-Schwarz to 
replace all the functions fj with v, and then applications of the linear forms condition. 
In order to expedite these applications of Cauchy-Schwarz we shall need some notation. 
Suppose that ^ d ^ k — 1, and that we have two vectors y = (yi, . . . ,yk-i) £ ^aT 1 
and y' = (y' k _ d , ■ ■ ■ ,y'k-i) e Z d N of length k — 1 and d respectively. For any set S C 



{k - d, . . . , k - 1}, we define the vector y^ = (y^ , y { k %) e Z k ~ l 



N 



as 



Vi 



(S) _ f Vi Hi&S 
y\ KieS. 



The set S thus indicates which components of y( s ' come from y' rather than y. 

Lemma 5.4 (Cauchy-Schwarz). Let v : Z^ — > R + be any measure. Let fa, <pi, . . . , <fik-i '■ 
Z^ -1 — > Zat fre functions of k — 1 variables yi, ■ ■ ■ ,yk-i, such that does not depend 
on yi for 1 ^ i ^ k — \. Suppose that f , fi, . . . , fk-i & L x {Zn) are functions satisfying 
\fi( x )\ ^ z/ ( a; ) / or °^ x G ^at an< ^ / or eac ^ ^ z ^ — 1. For each ^ d ^ k — 1, 
define the quantities 

k-d-l k-1 



and 



J * ■= E ( II ( ff MMv iS) )))( 11 » 1/2 (&(y {s) ))) y e z £rV e ^) 

(5.17) 

P d :=E( J] ^k- d -i(y {S) )) yeZ k N \y' eZ d N \ (5.18) 

Then for any ^ d ^ — 2, we /iai>e i/ie inequality 

\Jd? < P^+i- (5.19) 
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Remarks. The appearance of v x l 2 in (15.171) may seem odd. Note, however, that since 
4>i does not depend on the i th variable, each factor of v x l 2 in (I5.17P occurs twice. If one 
takes k = 3 and 

Mvu 2/2) = yi + V2, 0i (yi, 2/2) = 2/2/2, (j) 2 (yi, y 2 ) = -yi, (5.20) 

then the above notation is consistent with the quantities Jo, Ji, J 2 defined in the pre- 
ceding discussion. 



Proof of Lemma \5.4\ Consider the quantity Jd- Since (pk-d-i does not depend on 
y^-d-i, we may take all quantities depending on 4>k~d-i outside of the yk-d-i average. 
This allows us to write 

J d = E(G(y, y')H(y, y') \ yi,..., y k -d~2, yk-d, • • • , Vk-i, y'k-d, ■ ■ ■ > Vk-i E z n) , 
where 

G(y,y') := J] fk-d^ k -d^y {s) ))v- 1/2 (<f>k-d-i(y {s) )) 

SC{k-d,...,k-l} 

and 

(k-d-2 \ 
n n uuy {s) )) n ^ i/2 (<^ (5) )) 
SC{k-d,...,k-l} i=0 i=k-d-l ' 

(note we have multiplied and divided by several factors of the form v 1 / 2 ((fik-d-i(y^)) ■ 
Now apply Cauchy-Schwarz to give 

|J d | 2 ^ E(\G(y,y')\ 2 \ y u y k -d-2, yk-d, ■ yk-i, Vk-d* Vk-i G %n) x 

x^(\H(y,y')\ 2 \yi,.. ■ , y k -d-2, yk-d, ■ ■ . ,y k -i,y k - d , ■ ■ -,y k -i e Z N ). 
Since |/fc_d_i(x)| ^ u(x) for all x, one sees from (15.181) that 

^(\G(y,y')\ 2 1 - - . ,y k -d-2,y k -d, ■ ■ . ,y k -i,y' k _ d , . . . , j/jUx e Zjy) ^ 

(note that the y k - d -i averaging in (15.181) is redundant since 1 does not depend on 
this variable). Moreover, by writing in the definition of H(y,y') and expanding out the 
square, replacing the averaging variable y k - d -\ with the new variables yk-d-\,y'k-d-x, 
one sees from (15.171) that 

^(\H(y,y')\ 2 I yi, . . . ,yk-d-2,yk-d, ■ ■ ■ ,y k -\,y' k - d , ■ ■ -,y'k-i e ^v) = 
The claim follows. □ 

Applying the above lemma k — 1 times, we obtain in particular that 

iJor^jfc-inV -2 " (5.21) 



d=0 



Observe from (I5.17P that 



fc-i 



i=0 



y e z^r 1 . (5.22) 
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Proof of Proposition \ 5.3[ We will apply (15.211) . observing that (I5.22p can be used to 
count configurations (x,x + Cir, . . . ,x + c^-ir) by making a judicious choice of the 
functions 0j. For y = (7/1, ... , yk~i), take 



fc-i , \ 



for z = 0, . . . , k — 1. Then 0o(y) = yi + ■ ■ ■ + yk-i, <f>i(y) does not depend on 7/j and, as 
one can easily check, for any y we have 4>i{y) = x + Qr where 



fe-i 



1=1 



Note that ( I5.20p is simply the case k = 3, Cj = j of this more general construction. Now 
the map $ : Z^" 1 — > Z^ defined by 

Vk-i 



Hv) ■= (yi + --- + y k -u- + - + - 

is a uniform cover, and so 

/• k— 1 \ /• fc— 1 



+ 



-) 



j=0 



i=0 



t/GZ 



k-l 
N 



J 



(5.23) 



thanks to (15.221) (this generalises (I5.14p ). On the other hand we have P^ = 1 + o(l) for 
each ^ d ^ — 2, since the /c-pseudorandom hypothesis on z/ implies the (2 d , — 1 + 
(i, fc)-linear forms condition. Applying (I5.2ip we thus obtain 

J 2fc - 1 ^(l + o(l))J fc _ 1 (5.24) 

(this generalises ( 15. 15ft ). Fix y G Z^ 1 . As S ranges over all subsets of {1, . . . , k — 1}, 
4>o(y^) ranges over a (k — l)-dimensional cube {x + u ■ h : uj G {0, l} fc_1 } where 
a; = yx + ■ — h yk-i and hi — y[ — y^, i — 1, . . . , k — 1. Thus we may write 



J k - 1 = E\W(x,h) J] f (x + u-h) 
^ u;e{o,i}'=- 1 

where the weight function W(x, h) is given by 

, k-l 

W(x,h) = E( [[ HvWfaiy + uh)) 



x e Zyv, heZ k N l 



(5.25) 



fc-1 



(K— 1 
n n v(<t>i(y+"h)) 

<^i=0 



7/1, . . • ,7/fc_ 2 G Zjv 
7/i, . . . ,7/ fc _ 2 G Zjv 



(this generalises ( 15. 16ft ). Here, uh G Z^ is the vector with components (ouh)j := uijhj 
for 1 ^ j ^ k — 1, and 7/ G Z^ 1 is the vector with components yj for 1 ^ j ^ — 2 and 
7/fc_i := x — 7/i — ... — 7/fc-2- Now by the definition of the U k ~ 1 norm we have 



E 



( n Mx+u-h) 



x 



fe-i 

v 



2 

oil c/fe- 1 ■ 
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To prove (ETT3j) it therefore suffices, by (ET23]I . dOD and (JS2SD, to prove that 
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fc— 1 
AT, fi t ^ 



x G Zjv, h & Z 



EUW(x,h)-l) J] / (x + o;-/i) 

Using (15.121) . it suffices to show that 

e(\W{x,H)-1\ ] [ z/(x + cj-/i) 
^ wejo,!}^ 1 

Thus by Cauchy-Schwarz it will be enough to prove 

Lemma 5.5 [y covers its own cubes uniformly). For n = 0,2, we have 



o(l). 



x G Zn, h G Z' 



o(l). 



E(|%l() - l| n I I u(x + u-h) 



iez w ,ii6 z^r 1 1 = 0" + o(i) 



Proof. Expanding out the square, it then suffices to show that 



e(w{x, h) q ] [ v{x + u- h) 

^ iue{o,i} k - 1 



x G Zyv, h G Z^ 1 ) = 1 + o(l) 



for 9 = 0,1,2. This can be achieved by three applications of the linear forms condition, 
as follows: 

q = 0. Use the (2 fc_1 , k, l)-linear forms property with variables x, hi, . . . , /ifc_i and forms 



x + uo-h uj G {0, l} fc_1 . 

9=1. Use the (2 k ~ 2 (k+l), 2k— 2, fc)-linear forms property with variables x, hi, . . . , ft,fc-i, 
. . • ,7/ fc _ 2 and forms 

<k(y + uh) u e {0, l} fe ~\ = 0, 1 < i < k - 1; 
x + u-h, G {0, l} fe-1 . 



fc-i, 



9 = 2. Use the (k ■ 2 ,3k — 4, fc)-linear forms property with variables x, hi, . . . ,h 
7/i, ... , 7/ fc _2, 7/i, . . . , y' k _ 2 and forms 

(f)i(y + ujh) u;G{0,l} fc ~ 1 , Wi = 0,1 < i < A;- 1; 
(piiy' + uh) ue{0,l} k ~\ Ui = 0, 1 «C i ^ k- 1; 
x + w-fc, Ci) G {0, l} fe_1 . 

Here of course we adopt the convention that yk-i — x — t/i — . . . — 7/^-2 and t/^,_ 1 = 
x — y[ — . . . — y' k __ 2 - This completes the proof of the lemma, and hence of Proposition 
E3J ' ' □ 



6. GOWERS ANTI-UNIFORMITY 

Having studied the U k ^ 1 norm, we now introduce the dual (U k ~ 1 )* norm, defined in the 
usual manner as 



ghuk-i). := sup{|(/,c7)| : / G U k ~ l (Z N ), \\f\\ uh -i ^ 1}. 



(6.1) 
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We say that g is Gowers anti-uniform if Hgll^fc-i)* = 0(1) and ||<7||z,°° = 0(1). If g is 
Gowers anti-uniform, and if | (/, g) | is large, then / cannot be Gowers uniform (have 
small Gowers norm) since 

\(f,g)\ < ll/llc7*-i|MI(tf*-i)*- 

Thus Gowers anti-uniform functions can be thought of as "obstructions to Gowers uni- 
formity". The (U k ~ 1 )* are well-defined norms for k ^ 3 since U k ~ 1 is then a genuine 
norm (not just a seminorm). In this section we show how to generate a large class of 
Gowers anti-uniform functions, in order that we can decompose an arbitrary function 
/ into a Gowers uniform part and a bounded Gowers anti-uniform part in the next 
section. 

Remark. In the k = 3 case we have the explicit formula 

NI(^)* = (E l^)| 4/3 ) 3/4 = ll?ll4/3. (6.2) 

We will not, however, require this fact except for motivational purposes. 

A basic way to generate Gowers anti-uniform functions is the following. For each func- 
tion F G L 1 (Zat), define the dual function T>F of F by 



VF(x):=e( Yl F{x + u-h) heZfc 1 ) 

^we{o,i} fe ^ 1 ^^o fe - 1 ' 

where fc_1 denotes the element of {0, l} fc_1 consisting entirely of zeroes. 



(6.3) 



Remark. Such functions have arisen recently in work of Host and Kra [2H] in the ergodic 
theory setting (see also [1]). 

The next lemma, while simple, is fundamental to our entire approach; it asserts that 
if a function majorised by a pseudorandom measure v is not Gowers uniform, then 
it correlated^ with a bounded Gowers anti-uniform function. Boundedness is the key 
feature here. The idea in proving Theorem 13. 51 will then be to project out the influence 
of these bounded Gowers anti-uniform functions (through the machinery of conditional 
expectation) until one is only left with a Gowers uniform remainder, which can be 
discarded by the generalised von Neumann theorem (Proposition 15.31) . 

Lemma 6.1 (Lack of Gowers uniformity implies correlation). Let v be a k -pseudorandom 
measure, and let F G L 1 (Zjv) be any function. Then we have the identities 

(F,VF) = \\F\\^ (6.4) 

and 

\\VF\\ (uk - ly = WFW^r 1 - (6.5) 
If furthermore we assume the bounds 

\F(x)\ ^ v(x) + 1 for all x G Z N 



13 This idea was inspired by the proof of the Furstenberg structure theorem [TU1 H3] ; a key point in 
that proof being that if a system is not (relatively) weakly mixing, then it must contain a non-trivial 
(relatively) almost periodic function, which can then be projected out via conditional expectation. A 
similar idea also occurs in the proof of the Szemeredi regularity lemma [38 . 



THE PRIMES CONTAIN ARBITRARILY LONG ARITHMETIC PROGRESSIONS 



21 



then we have the estimate 

\\VF\\ L oc ^2 2 "~ 1 - 1 + o(l). (6.6) 

Proof. The identity (16.41) is clear just by expanding out both sides using (16.31) . (15 .4ft . 
To prove (16.51) we may of course assume F is not identically zero. By ( 16.1 ft and (16.41) it 
suffices to show that 

KLVF^^WfWu^WF^- 1 

for arbitrary functions /. But by (16.31) the left-hand side is simply the Gowers inner 
product ((/ w ) we {o,i}fe-i)(7 fe - 1 ) where f u := / when uj = and := F otherwise. The 
claim then follows from the Gowers Cauchy-Schwarz inequality ( 15. 5ft . 

Finally, we observe that (16.61) is a consequence of the linear forms condition. Bounding 
F by 2(z/ + l)/2 = 2v\/2, it suffices to show that 

Vp 1/2 {x) «C l + o(l) 

uniformly in the choice of a; G I^n- The left-hand side can be expanded using H 6 . 3 1) as 

e( Yl v l/2 (x + uj-h) heZ^-A. 

^ ue{o,i} k - 1 -.w^o k - 1 ' 

By the linear forms condition ( 13. lft (and Lemma 13.41) this expression is 1 + o(l) (this 
is the only place in the paper that we appeal to the linear forms condition in the non- 
homogeneous case where some bi ^ 0; here, all the bi are equal to x). Note that (13.31) 
corresponds to the k = 3 case of this application of the linear forms condition. □ 

Remarks. Observe that if P : Z^r — » Zjv is any polynomial on Z^ of degree at most 
k — 2, and F(x) = e 2mP ( x )/ N ) ther0 VF = F; this is basically a reflection of the fact 
that taking k — 1 successive differences of P yields the zero function. Hence by the 
above lemma ||F||m-fe-:n* ^ 1, and thus F is Gowers anti-uniform. One should keep 
these "polynomially quasiperiodic" functions e 2mP ( x )/ N i n mind as model examples of 
functions of the form VF, whilst bearing in mind that they are not the only example^. 
For some further discussion on the role of such polynomials of degree k— 2 in determining 
Gowers uniformity especially in the k = 4 case, see [IHl EE]. Very roughly speaking, 
Gowers uniform functions are analogous to the notion of "weakly mixing" functions that 
appear in ergodic theory proofs of Szemeredi's theorem, whereas Gowers anti-uniform 
functions are somewhat analogous to the notion of "almost periodic" functions. When 
k = 3 there is a more precise relation with linear exponentials (which are the same thing 
as characters on Z^r). When v — 1, for example, one has the explicit formula 

VF{x)= |%)| 2 %)e 2 ^. (6.7) 



To make this assertion precise, one has to generalise the notion of dual function to complex-valued 
functions by inserting an alternating sequence of conjugation signs; see [19] , 

15 The situation again has an intriguing parallel with ergodic theory, in which the role of the Gowers 
anti-uniform functions of order k — 2 appear to be played by k — 2-step nilfactors (see [SHI [211 [44] ) . 
which may contain polynomial eigenfunctions of order k — 2, but can also exhibit slightly more general 
behaviour; see |14| for further discussion. 
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Suppose for sake of argument that F is bounded pointwise in magnitude by 1. By 
splitting the set of frequencies Zn into the sets S := {£ : ^ e ) an d Z^\S one 

sees that it is possible to write 

vf(x) = J2 a a e2nixm + E ( x ), 

where \a^\ ^ 1 and ^ e- Also, we have \S\ ^ e~ 2 . Thus VF is equal to a linear 

combination of a few characters plus a small error. 

Once again, these remarks concerning the relation with harmonic analysis are included 
only for motivational purposes. 

Let us refer to a functions of the form T>F, where F is pointwise bounded by v + 1, 
as a basic Gowers anti-uniform function. Observe from (16.61) that if N is sufficiently 
large, then all basic Gowers anti-uniform functions take values in the interval / : = 
[-2 2fc_1 ,2 2 *" 1 ]. 

The following is a statement to the effect that the measure v is uniformly distributed 
with respect not just to each basic Gowers anti-uniform function (which is a special case 
of (16. 5p ). but also to the algebra generated by such functions. 

Proposition 6.2 (Uniform distribution wrt basic Gowers anti-uniform functions). Sup- 
pose that v is k -pseudorandom. Let K ^ 1 be a fixed integer, let $ : / — > R be a fixed 
continuous function, let T>F\, . . . ,T>Fk be basic Gowers anti-uniform functions, and 
define the function ip : Z^ — > K- by 

tfj(x) := $(VF 1 (x),...,VF K (x)). 

Then we have the estimate 

(v - l,ip) = o K ^(l). 

Furthermore if $ ranges over a compact set E C C°(I K ) of the space C°(I K ) of con- 
tinuous functions on I K (in the uniform topology) then the bounds here are uniform in 
$ (i.e. one can replace o^ i $(l) with ok,e{^) ^ n this case). 

Remark. In light of the previous remarks, we see in particular that v is uniformly 
distributed with respect to any continuous function of polynomial phase functions such 
as e 2mP ( x )/ N ^ where P has degree at most k — 2. 

Proof. We will prove this result in two stages, first establishing the result for $ polyno- 
mial and then using a Weierstrass approximation argument to deduce the general case. 
Fix K ^ 1, and let Fx, ... , F K 6 L 1 (Z N ) be fixed functions obeying the bounds 

Fj(x) ^ v(x) + 1 for all x G Z N , 1 ^ j ^ K. 

By replacing v by (v + l)/2, dividing the Fj by two, and using Lemma [3.41 as before, 
we may strengthen this bound without loss of generality to 

\Fj(x)\ ^ v(x) for all x G Z N , 1 < j < K. (6.8) 

Lemma 6.3. Let d ^ 1. For any polynomial P of K variables and degree d with real 
coefficients (independent of N), we have 

\\P(VF X , . . .,VF K )\\ {u u-ly = KAP (1). 
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Remark. It may seem surprising that that there is no size restriction on K or d, since 
we are presumably going to use the linear forms or correlation conditions and we are 
only assuming those conditions with bounded parameters. However whilst we do indeed 
restrict the size of m in (13.61) . we do not need to restrict the size of q in (13. 5j) . 

Proof. By linearity it suffices to prove this when P is a monomial. By enlarging K to 
at most dK and repeating the functions Fj as necessary, it in fact suffices to prove this 
for the monomial P(x\, . . . , xk) — X\ . . . xk- Recalling the definition of (U k ~ 1 )*, we are 
thus required to show that 

K 

{f,X{VF J )=0 K {l) 

i=i 

for all / : Zn — > K satisfying ^ 1- By (16.31) the left-hand side can be expanded 

as 

e(7(x)JJe( n Fj(x + uj-h U) ) | h (j) G Z k N x ) xeZ N 

We can make the change of variables = h + H® for any h G Z^ -1 , and then average 



over h, to rewrite this as 



/ K 

El f(x) JjE( Yl Fj(x+u-H^+u-h) | H® G Z k N l ) 

J'=l a;e{0,l>' : " :L ^^O fe - 1 



x G Zat; /i G Z^ 



fc-l 



TV 



Expanding the j product and interchanging the expectations, we can rewrite this in 
terms of the Gowers inner product as 

E(((/a,,tf) w e{o,i} fc - 1 )[/ fc - 1 | He (Z^ -1 )^) 

where H := (ffW . . . , ffW), / ,h := /, and / W , H := Q^h for w ^ O^ 1 , where u ■ H := 
(u-H^,...,u-H^) and 

j uP ) : = + for all . . . , G Z*. (6.9) 

i=i 

By the Gowers-Cauchy-Schwarz inequality (15. 5p we can bound this as 



E 



uk ~ i n ii^-ffiic/fe-i 

we{0,l} fe - 1 :o;^O fc - 1 



H e (z k 



k-l\K 
N ) 



so to prove the claim it will suffice to show that 



E 



} | HPwidlt/*- 1 
we{0,l} fc ~ 1 :o;^O fe - 1 



H e (z k N T = K (i). 



By Holder's inequality it will suffice to show that 

Edl^nl&t-r 1 | H e {Z k N l ) K ) = O k (1) 
for each uj G {0, l}*" 1 ^ -1 . 
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Fix u. Since 2 k 1 — 1 ^ 2 k l , another application of Holder's inequality shows that it 
in fact suffices to show that 

Edl^H 2 ^ I H e (z k N Y) = O k {\). 

Since to ^ fc_1 , the map H i— > uo ■ H is a uniform covering of Z A by (Z k ^~ 1 ) K . Thus by 
( 14. ip we can rewrite the left-hand side as 

,(i) 



E (IK(i),..., M w||f/*-i I u( 



Expanding this out using (15. 4p and (16.91) . we can rewrite the left-hand side as 

K 



E 



( II i[Fj(x + u u) + h-u) 



= {0,1} 

This factorises as 

K 



X 



eZ N ,he Z k - 1 ,u ( - 1 \...,u^ e Z 



N 



E (fl E ( II Fj(x + u {j) + h ■ u) \ u {j) E Z N ) 



•j=i £e{o,i}* 
Applying ( 16. 8ft . we reduce to showing that 



x eZ Nl he z k ~ v 



x eZ N ,he z 



k-1 
N 



O k (1). 



El E( ] [ v(x + u + h-Cu)\ue Z N ) J 
^ we{o,i} fe - 1 

We can make the change of variables y := x + u, and then discard the redundant x 
averaging, to reduce to showing that 

e(e( J] u(y + h-Q) \yeZ N ) K h G Z k N l \ = O k (1). 

Now we are ready to apply the correlation condition (Definition 13.21) . This is, in fact, 
the only time we will use that condition. It gives 



(iejO,!}*- 1 ' tj,w'e{0,l} & - 1 :a)^cD' 

where, recall, r is a weight function satisfying E(r 9 ) = O q (l) for all g. Applying the 
triangle inequality in L X (Z^~ 1 ), it thus suffices to show that 



E(r{h 



\uj — u 



~-')) K I h e Z k N l ) = K (l) 



for all distinct u, to' G {0, l} k 1 . But the map h i— > h- (u — u/) is a uniform covering of 
Z N by (Z N ) k -\ so by the left-hand side is just E(r A '), which is O k (1). □ 

Proof of Proposition 167^ Let $, ^ be as in the Proposition, and let e > be arbitrary. 
From (I6.6P we know that the basic Gowers anti-uniform functions T>Fi, . . . ,T>Fx take 
values in the compact interval / := [— 2 2 , 2 2 *] introduced earlier. By the Weierstrass 
approximation theorem, we can thus find a polynomial P (depending only on K and e) 
such that 

\\${VF X , . . .,VF K ) - P(VF U . . .,VF K )\\ Lea < s 
and thus by ( 12.4ft and taking absolute values inside the inner product, we have 
\(u-l, ${VF U VF K ) - P(VF U VF K ))\ ^ (2 + o(l))e. 
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On the other hand, from Lemma 16.31 Lemma 15.21 and (16. ip we have 

(u-l,P(DF h ...,VF K ))=o Ki£ (l) 

since P depends on K and e. Combining the two estimates we thus see that for iV 
sufficiently large (depending on K and e) we have 

\(v-l,$(VF 1 ,...,VF K ))\^4e 

(for instance). Since e > was arbitrary, the claim follows. It is clear that this 
argument also gives the uniform bounds when <3> ranges over a compact set (by covering 
this compact set by finitely many balls of radius e in the uniform topology). □ 

Remarks. The philosophy behind Proposition 16.21 and its proof is that (U k ~ 1 )* respects, 
to some degree, the algebra structure of the space of functions on Z, N . However, || • 
||mfc-i)» is not itself an algebra norm even in the model case k = 3, as can be seen from 
( 16. 2p by recalling that ||<7||(e/2)* = ||g||4/3- Note, however, that ||p||([/2)* ^ HfliU, where 
IMU := ll^lli i s the Wiener norm. From Young's inequality we see that the Wiener 
norm is an algebra norm, that is to say Hp^Ha ^ IMUH^IU- Thus while the (U 2 )* norm 
is not an algebra norm, it is at least majorised by an algebra norm. 

Now ( 16. TP easily implies that if ^ F(x) ^ Vccmst(%) then ^ 1, and so in this 

case we really have identified an algebra norm (the Weiner norm) such that if H/Hc/z is 
large then / correlates with a bounded function with small algebra norm. The (U k ~ 1 )* 
norms can thus be thought of as combinatorial variants of the Wiener algebra norm 
which apply to more general values of k than the Fourier case k = 3. (See also (10] for 
a slightly different generalization of the Wiener algebra to the case k > 3.) 

For the majorant v that we will use to majorise the primes, it is quite likely that 
\\VF\\ A = 0(1) whenever ^ F{x) ^ which would allow us to use the Wiener 

algebra A in place of (U 2 )* in the k = 3 case of the arguments here. To obtain this 
estimate, however, requires some serious harmonic analysis related to the restriction 
phenomenon (the paper [23] may be consulted for further information). Such a property 
does not seem to follow simply from the pseudorandomness of v, and generalisation 
to U k ~ 1 , k > 3, seems very difficult (it is not even clear what the form of such a 
generalisation would be). 

For these reasons, our proof of Proposition 16.21 does not mention any algebra norms 
explicitly. 

7. Generalised Bohr sets and ct-algebras 

To use Proposition 16.21 we shall associate a a-algebra to each basic Gowers anti-uniform 
function, such that the measurable functions in each such algebra can be approximated 
by a function of the type considered in Proposition 16.21 We begin by setting out our 
notation for cx-algebras. 

Definition 7.1. A cx-algebra B in Z^r is any collection of subsets of Z^r which contains 
the empty set and the full set Z^r, and is closed under complementation, unions and 
intersections. As Z^r is a finite set, we will not need to distinguish between countable 
and uncountable unions or intersections. We define the atoms of a cr-algebra to be the 
minimal non-empty elements of B (with respect to set inclusion); it is clear that the 
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atoms in B form a partition of Z^r, and B consists precisely of arbitrary unions of its 
atoms (including the empty union 0). A function / G L q (Z^) is said to be measurable 
with respect to a cx-algebra B if all the level sets {f~ 1 ({x}) : i G M} of / lie in B, or 
equivalently if / is constant on each of the atoms of B. 

We define L q (B) C L q (Z N ) to be the subspace of L q (Z N ) consisting of immeasurable 
functions, equipped with the same L q norm. We can then define the conditional expec- 
tation operator / i— »■ E(/|£>) to be the orthogonal projection of L 2 (Zn) to L 2 (B); this is 
of course also defined on all the other L q (Z^) spaces since they are all the same vector 
space. An equivalent definition of conditional expectation is 

E(f\B)(x) :=E(f(y)\yeB(x)) 

for all x G Zjv, where B(x) is the unique atom in B which contains x. It is clear that 
conditional expectation is a linear self-adjoint orthogonal projection on L 2 (Zjv), is a 
contraction on L q (Z^) for every 1 ^ q ^ oo, preserves non-negativity, and also preserves 
constant functions. Also, if B' is a subalgebra of B then E(E(f\B)\B') = E(f\B'). 

If Bi, . . . , Bk are cx-algebras, we use V,=i &j = $i V . . . V Bk to denote the cx-algebra 
generated by these algebras, or in other words the algebra whose atoms are the inter- 
sections of atoms in Si, ... , Bk- We adopt the usual convention that when K = 0, the 
join Vj=i &j * s J us t the trivial cx-algebra {0, Z^}. 

We now construct the basic cx-algebras that we shall use. We view the basic Gowers 
anti-uniform functions as generalizations of complex exponentials, and the atoms of the 
cx-algebras we use can be thought of as "generalised Bohr sets" . 

Proposition 7.2 (Each function generates a cx-algebra). Let v be a k -pseudorandom 
measure, let < e < 1 and < r\ < 1/2 be parameters, and let G G L°°(Z N ) be function 
taking values in the interval I := [— 2 2 , 2 2 ]. Then there exists a a -algebra B £tV (G) 
with the following properties: 

• (G lies in its own cx-algebra) For any o-algebra B, we have 

\\G - E{G\B V B^(G))||l«(z w ) < e. (7.1) 

• (Bounded complexity) B £ ^{G) is generated by at most 0(1/ e) atoms. 

• (Approximation by continuous functions of G) If A is any atom in B £)71 {G), then 
there exists a continuous function ^ a '■ I —* [0, 1] such that 

\\(l A -y A (G))(v + l)\\ LHZN) = 0( V ). (7.2) 

Furthermore, lies in a fixed compact set E = E EiV of C°(I) (which is inde- 
pendent of F , v, N, or A). 

Proof. Observe from Fubini's theorem and (\2A\\ that 

/ ^E(l G{x)£[e(n _ v+a)An+v+a)] (u(x)+l) | x G Z N ) da = 2r]E(u(x)+l\x G Z N ) = 0(r]) 

and hence by the pigeonhole principle there exists ^ a ^ 1 such that 

^E(l G(x)€Hn _ v+a) ^ n+v+a)] (is(x) + 1) | x G Z N ) = 0{rj). (7.3) 

neZ 
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We now set B e>rj (G) to be the cx-algebra whose atoms are the sets G~ l ([e(n + a),e(n + 
1 + a))) for neZ. This is well-defined since the intervals [e(n + a),e(n + 1 + a)) tile 
the real line. 

It is clear that if B is an arbitrary a- algebra, then on any atom of B V B £tV (G), the 
function G takes values in an interval of diameter e, which yields (17.11) . Now we verify the 
approximation by continuous functions property. Let A := G~ 1 ([e(n + a), e{n + 1 + a))) 
be an atom. Since G takes values in J, we may assume that n = 0(l/e), since A is 
empty otherwise; note that this already establishes the bounded complexity property. 
Let : M. — > [0, 1] be a fixed continuous cutoff function which equals 1 on [77, 1 — 77] 
and vanishes outside of [—77, 1 + 77], and define ^a(x) '■= V^C" — n — a). Then it is clear 
that ranges over a compact subset E £ , q of C°(J) (because n and a are bounded). 
Furthermore from (17. 3ft it is clear that we have (17.21) . The claim follows. □ 

We now specialise to the case when the functions G are basic Gowers anti-uniform 
functions. 

Proposition 7.3. Let v be a k -pseudorandom measure. Let K ^ 1 be an fixed integer 
and letVFi, . . . ,VF K e L°°(Z, N ) be basic Gowers anti-uniform functions. LetO < e < 1 
and < 77 < 1/2 be parameters, and let B £ ^{VFj), j = 1,...,K, be constructed as 
in Proposition Let B := B E!V (VF X ) V ... V B £ ^(VF K ). Then if 77 < r] {e,K) is 



sufficiently small and N > N (e,K,r]) is sufficiently large we have 

\\VFj - EiVFjfflWvp^ ^ e for all l^j^K. (7.4) 
Furthermore there exists a set Q which lies in B such that 

E((u+l)l n )=0 K<£ (r ] 1 / 2 ) (7.5) 

and such that 

||(1 - l n )E(u - l\B)\\ L ~ [ZN) = OkM 12 )- (7-6) 

Remark. We strongly recommend that here and in subsequent arguments the reader 
pretend that the exceptional set Q is empty; in practice we shall be able to set rj small 
enough that the contribution of Q to our calculations will be negligible. 

Proof. The claim i\7A\\ follows immediately from (17. lft . Now we prove (17. 5ft and (17. 6p . 
Since each of the B £ , q {VFj) are generated by 0(l/e) atoms, we see that B is generated 
by Ok,s{^-) atoms. Call an atom A of B small if E((z/ + 1)1^) ^ 7 ? 1 ^ 2 , and let VL be the 
union of all the small atoms. Then clearly Q lies in B and obeys (17. 5p . To prove the 
remaining claim (17. 6p . it suffices to show that 

= n» - l\A) = + O k ,M'*) (7.7) 

for all atoms A in B which are not small. However, by definition of "small" we have 
E((z/ - 1)1 A ) + 2E(1 A ) = E((i/ + l)l A ) > V 1/2 - 

Thus to complete the proof of (17.71) it will suffice (since 77 is small and N is large) to 
show that 

E((i/ - 1)14) = 0^(1) + OkM- (7-8) 
On the other hand, since A is the intersection of K atoms A\, . . . , A^ from B £tV (VFi), . . ., 
B £>n (T>F K ) respectively, we see from Proposition 17.21 and an easy induction argument 
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(involving Holder's inequality and the triangle inequality) that we can find a continuous 
function ty A : I K — > [0, 1] such that 

\\(u + 1)(1 A - Va(VF u . . .,VF K ))\\ LHZir) = O k ( V ), 

so in particular 

|| (u - 1)(1 A - * A {VF X , . . .,VF k ))\\ lHZn) = K {ri). 

Furthermore one can easily ensure that ^ a lives in a compact set E e ri ^ of C°(I K ). 
From this and Proposition 16.21 we have 

E((i/ - • • • , T>F K ))) = o KlS , v {l) 

since N is assumed large depending in K,e,rj, and the claim (17.81) now follows from the 
triangle inequality. □ 

Remarks. This a-algebra B is closely related to the (relatively) compact cr-algebras 
studied in the ergodic theory proof of Szemeredi's theorem, see for instance [TUHH]- In 
the case k = 3 they are closely connected to the Kronecker factor of an ergodic system, 
and for higher k they are related to (k — 2)-step nilsystems, see e.g. [2"%l 133] . 

8. A FURSTENBERG TOWER, AND THE PROOF OF THEOREM 13.51 

We now have enough machinery to deduce Theorem 13.51 from Proposition 12.31 The 
key proposition is the following decomposition, which splits an arbitrary function into 
Gowers uniform and Gowers anti- uniform components (plus a negligible error). 

Proposition 8.1 (Generalised Koopman-von Neumann structure theorem). Let v be 

a k-pseudorandom measure, and let f G L x (^Ljq) be a non-negative function satisfying 
^ f(x) ^ u(x) for all x G Tjn- Let < e < 1 6e a small parameter, and assume 
N > N Q (s) is sufficiently large. Then there exists a a-algebra B and an exceptional set 
Q G B such that 

• (smallness condition) 

E(i/l n ) = o 6 (l); (8.1) 

• [y is uniformly distributed outside of Q) 

||(l-l n )E(i/-l|iB)|| LO c = 0e (l) (8.2) 

and 

• (Gowers uniformity estimate) 

||(l-l n )(/-E(/|B))||^i< e V*. (8.3) 

Remarks. As in the previous section, the exceptional set Q should be ignored on a 
first reading. The ordinary Koopman-von Neumann theory in ergodic theory asserts, 
among other things, that any function / on a measure-preserving system (X, B, T, fi) 
can be orthogonally decomposed into a "weakly mixing part" / — E(/|i3) (in which 
/ — E(/|£>) is asymptotically orthogonal to its shifts T n (f — E(/|jB)) on the average) 
and an "almost periodic part" E(/|B) (whose shifts form a precompact set); here B 
is the Kronecker factor, i.e. the a-algebra generated by the almost periodic functions 
(or equivalently, by the eigenfunctions of T). This is somewhat related to the k = 3 
case of the above Proposition, hence our labeling of that proposition as a generalised 
Koopman-von Neumann theorem. A slightly more quantitative analogy for the k = 3 
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case would be the assertion that any function bounded by a pseudorandom measure can 
be decomposed into a Gowers uniform component with small Fourier coefficients, and 
a Gowers anti-uniform component which consists of only a few Fourier coefficients (and 
in particular is bounded). For related ideas see [5j [2TI [23] . 

Proof of Theorem \3.5\ assuming Proposition \8. 1\ Let /, 5 be as in Theorem 13.51 and let 
< £ <C 5 be a parameter to be chosen later. Let B be as in the above decomposition, 
and write fa := (l-l n )(/-E(/|B)) and fax := (l-l n )E(/|B) (the subscript U stands 
for Gowers uniform, and U 1 - for Gowers ant i- uniform). Observe from ( 18. II) . ( 13.71) . ( 13.81) 
and the measurability of Q that 

E(fax) = E((l - l n )/) > E(/) - E(i/l n ) ^ 5 - o £ (l). 

Also, by (18.21) we see that/j/x is bounded above by 1 + o e (l). Since / is non-negative, 
/j/j. is also. We may thuo apply Proposition 12.31 to obtain 

E(fax(x)fax(x + r)...fax(x + (k-l)r) \ x,r£Z N ) > c{k, 5) - o s (l) - o M (l). 

On the other hand, from (I8.3P we have ||/c/||r/fe-i ^ £ 1,/2fe ; since (1 — In)/ is bounded 
by and /^a is bounded by 1 + o e (l), we thus see that /# is pointwise bounded by 
i/ + 1 + o e (l). Applying the generalised von Neumann theorem (Proposition 15. 3j) we 
thus see that 

E(/o(x)/i(x + r) . . . / fc _i(x + (A; - l)r) \ Xl reZ N )= 0(e 1/2k ) + o e (l) 

whenever each /j is equal to /[/ or /[/x, with at least one fj equal to fa. Adding these 
two estimates together we obtain 

E(/(x)/(x + r) . . . /> + (fc - l)r) \x, r e Z N ) > c(fc, 5) - 0(e 1/2k ) - o e (l) - o M (l), 

where f '■= fa + fu 1 - — (1 — In)/- But since 0^(1 — In)/ ^ / we obtain 

E(/(:c)/(x + r) . . . /(x + (fc - l)r) |x, r G Zjv) ^ c(fc, 5) - O^ 1 / 2 ") - o e (l) - o M (l). 

Since £ can be made arbitrarily small (as long as iV is taken sufficiently large), the 
error terms on the right-hand side can be taken to be arbitrarily small by choosing JV 
sufficiently large depending on k and 5. The claim follows. □ 

To complete the proof of Theorem 13. 5[ it suffices to prove Proposition 18.11 To con- 
struct the cr-algebra B required in the Proposition, we will use the philosophy laid out 
by Furstenberg in his ergodic structure theorem (see (TQlIlH]), which decomposes any 
measure-preserving system into a weakly-mixing extension of a tower of compact exten- 
sions. In our setting, the idea is roughly speaking as follows. We initialise B to be the 
trivial a- algebra B = {0,Zat}. If the function / — M(f\B) is already Gowers uniform 
(in the sense of ( 18.31) ) . then we can terminate the algorithm. Otherwise, we use the 
machinery of dual functions, developed in £j6l to locate a Gowers anti-uniform function 
T>Fi which has some non-trivial correlation with /, and add the level sets of T>F\ to the 
a-algebra £>; the non-trivial correlation property will ensure that the L 2 norm of E(/|B) 
increases by a non-trivial amount during this procedure, while the pseudorandomness 

16 There is an utterly trivial issue which we have ignored here, which is that fu± is not bounded 
above by 1 but by 1 + o e (l), and that the density is bounded below by 6 — o s (l) rather than 6. One 
can easily get around this by modifying fax by o s (l) before applying Proposition ^. 31 incurring a net 
error of o e (l) at the end since fa± is bounded. 
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of v will ensure that E(/|23) remains uniformly bounded. We then repeat the above 
algorithm until / — E(/|£>) becomes sufficiently Gowers uniform, at which point we 
terminate the algorithm. In the original ergodic theory arguments of Furstenberg this 
algorithm was not guaranteed to terminate, and indeed one required the axiom of choice 
(in the guise of Zorn's lemma) in order to concludd^ the structure theorem. However, in 
our setting we can terminate in a bounded number of steps (in fact in at most 2 2 je + 2 
steps), because there is a quantitative L 2 -increment to the bounded function K(f\B) at 
each stage. 

Such a strategy will be familiar to any reader acquainted with the proof of Szemeredi's 
regularity lemma |39j . This is no coincidence: there is in fact a close connection between 
regularity lemmas such as those in [20, [22} [39] and ergodic theory of the type we have 
brushed up against in this paper. Indeed there are strong analogies between all of the 
known proofs of Szemeredi's theorem, despite the fact that they superficially appear to 
use very different techniques. 

We turn to the details. To prove Proposition 18. 11 we will iterate the following somewhat 
technical Proposition, which can be thought of as a cr-algebra variant of Lemma 16. 1[ 

Proposition 8.2 (Iterative Step). Let v be a k -pseudorandom measure, and let f G 
L 1 (Z;v) be a non-negative function satisfying ^ f(x) ^ v[x) for all x G Zjy- Let < 
f] ^ e ^ 1 be small numbers, and let K ^ be an integer. Suppose that rj < r] (e,K) 
is sufficiently small and that N > N (e,K,r]) is sufficiently large. Let F\,...,Fk G 
L x (Ztv) be a collection of functions obeying the pointwise bounds 

\F j (x)\^(l + K ^/ 2 ))(u(x) + 1) (8.4) 

for all 1 ^ j ^ K and x G Z^r. Let Bk be the a -algebra 

B K := B e „(VF x ) V ... V B e „(VF K ) (8.5) 

where B £ ^(VFj) is as in Proposition \ 7. and suppose that there exists a set Qk in Zjv 
obeying 

• (smallness bound) 

E((u+l)l nK ) = K , £ ( V 1 / 2 ) (8.6) 

and 

• (uniform distribution bound) 

||(1 - 1 Qk )E(u - 1\Bk)\\l~ { z n) = K M 12 )- (8-7) 

Set 

F K+1 :=(l-l QK )(f-E(f\B K )) (8.8) 
and suppose that F^+i obeys the non-Gowers-uniformity estimate 

\\F K+1 \\ uk -i >e 1/2 \ (8.9) 

Then we have the estimates 

||(l-l^)E(/|^)|Uoo (Ziv) ^l + K M /2 ) (8-10) 



For the specific purpose of fc-term recurrence, i.e. finding progressions of length fc, one only needs 
to run Furstenberg's algorithm for a finite number of steps depending on fc, and so Zorn's lemma is 
not needed in this application. We thank Bryna Kra for pointing out this subtlety. 
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(8.11) 



(8.12) 



then there exists a set £7 



K+l 



D Q K obeying 




(8.13) 



(8.14) 



and such that we have 
• (energy increment property) 



||(1 - l nK+1 Mf\B K+1 )\\ 2 L2(ZN) > ||(1 - 1^)E(/|^)||| 2(ZJV) + 2- 2fc+1 5. (8.15) 



Remark. If we ignore the exceptional sets VLk, Qk+1i this proposition is asserting the 
following: if / is not "relatively weakly mixing" with respect to the u-algebra Bk, in the 
sense that the component / — E(/|jB_r-) of / which is orthogonal to Bk is not Gowers- 
uniform, then we can refine Bk to a slightly more complex cr-algebra Bk+i such that 
the L 2 (Z N ) norm (energy) of E(/|£> A - +1 ) is larger than E(f\Bx) by some quantitative 
amount. Furthermore, v remains uniformly distributed with respect to Bk+i- 

Proof of Proposition \8.1\ assuming Proposition \8. 6 A Fix e, and let K be the smallest 
integer greater than 2 2 je + 1; this quantity will be the upper bound for the number 
of iterations of an algorithm which we shall give shortly. We shall need a parameter 
< i] <^ e which we shall choose later (we assume rj < r]o(e,Ko), and then we shall 
assume N > No(t),e) is sufficiently large. 

To construct B and Q we shall, for some K e [0,-fTo]; iteratively construct a sequence 
of basic Gowers anti- uniform functions T>Fi, . . . , T>Fk on together with exceptional 
sets Qq C Q± C . . . C Qk C Zat in the following manner. 

• Step 0. Initialise K — and Qq := 0. (We will later increment the value of K). 

• Step 1. Let Bk and F K +\ be defined by f )8.5p and f)8.8p respectively. Thus for 
instance when K = we will have £> = {0, Zjy} and F\ = /— E(/). Observe that 
in the K = case, the estimates (18.41) . f)8.6p . f)8.7p are trivial (the latter bound 
following from (12.41) ). As we shall see, these three estimates will be preserved 
throughout the algorithm. 

• Step 2. If the estimate (18.91) fails, or in other words that 



then we set £1 := Qk and B := Bk, and successfully terminate the algorithm. 
• Step 3. If instead (I8.9P holds, then we define Bk+i by f)8.12p . (Here we of course 
need K ^ K , but this will be guaranteed by Step 4 below). We then invoke 
Proposition 18.21 to locate an exceptional set flK+i ^ in &k+i obeying the 



Fk+iWu*- 1 ^ £ 



l/2 k 
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conditional (|8.13p . (I8.14p . and for which we have the energy increment property 
<KWf . Also, we have flHTTD . 
• Step 4. Increment K to K + 1; observe from construction that the estimates 
(18.41) . (18.61) . (18.71) will be preserved when doing so. If we now have K > Kq, then 
we terminate the algorithm with an error; otherwise, return to Step 1. 

Remarks. The integer K indexes the iteration number of the algorithm, thus we begin 
with the zeroth iteration when K — 0, then the first iteration when K — 1, etc. It 
is worth noting that apart from Ox^iv 1 ^ 2 ) error terms, none of the bounds we will 
encounter while executing this algorithm will actually depend on K. As we shall see, 
this algorithm will terminate well before K reaches Kq (in fact, for the application to 
the primes, the Hardy-Littlewood prime tuples conjecture implies that this algorithm 
will terminate at the first step K = 0). 

Assuming Proposition 18.21 we see that this algorithm will terminate after finitely many 
steps with one of two outcomes: either it will terminate successfully in Step 2 for some 
K ^ Kq, or else it will terminate with an error in Step 4 when K exceeds Kq. Assume 
for the moment that the former case occurs. Then it is clear that at the successful 
conclusion of this algorithm, we will have generated a a-algebra B and an exceptional 
set VL with the properties required for Proposition 18.11 with error terms of Ox^iv 1 ^ 2 ) 
instead of o e (l), if iV > N (r], K,e). But by making r\ decay sufficiently slowly to zero, 
we can replace the Ox, £ (^ 1,/2 ) bounds by o e (l); note that the dependence of the error 
terms on K will not be relevant since K is bounded by Kq, which depends only on e. 

To conclude the proof of Proposition 18.11 it will thus suffice to show that the above 
algorithm does not terminate with an error. Suppose for a contradiction that the 
algorithm ran until the K^ 1 iteration before terminating with an error in Step 4. Then 
if we define the energies Ek for ^ K ^ Kq + 1 by the formula 

E K := ||(l-ln K )E(/|^)|| 2 i2(Zjv) 

then we see from (18.151) that 

E K+1 ^ E K + 2- 2k+1 e for all < K ^ K (8.16) 

(for instance). Also, by (I8.10p we have 

^ E K ^ 1 + K:£ (t] 1/2 ) for all ^ K <: Kq. 

If 7] < i]q(K,e) is sufficiently small, these last two statements contradict one another 
for K = Kq. Thus the above algorithm cannot reach the K^ 1 iteration, and instead 
terminates successfully at Step 2. This completes the proof of Proposition 18.11 □ 

The only remaining task is to prove Proposition 18.21 

Proof of Proposition \8.B. Let z/, /, K, e, rj, Fx, . . . , F K , F K+1 , Bk, Bk+x be as in 
the proposition. We begin by proving the bounds (I8.10p . (18.111) . From (18. 7p we have 

||(1 - l njc )E(i/|B*r)IU~ < 1 + K ,e(v 1/2 ); 



Of course, the constants in the OQ bounds are different at each stage of this iteration, but we are 
allowing these constants to depend on K, and K will ultimately be bounded by Ko, which depends 
only on e and k. 
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since / is non-negative and bounded pointwise by u, we obtain fl8.10p . The bound 
(18.111) then follows from (I8.10p and (18.81) . where we again use that / is non-negative 
and bounded pointwise by v. This shows in particular that T>Fi, . . ., T>Fk 1+ i are basic 
Gowers anti-uniform functions (up to multiplicative errors of 1 + Ok 1iE (?7 1//2 ), which are 
negligible) . 

Applying Lemma [67T1 (scaling out the multiplicative error of 1 + Ox^iv 1 ^ 2 )) an d using 
flOD and (gZEI$ we conclude that 

ll^-IU-(^) ^ 2 2fc ~ 1 - 1 + Ok^ 1 ' 2 ) for all < j ^ K + 1, (8.17) 
since we are assuming N to be large depending on K, e, r\. 

We now apply Proposition 17.31 (absorbing the multiplicative errors of 1 + Ok^V 1 ^ 2 )) to 
conclude that we may find a set Q in Bk+i such that 

E((u+l)l n )=0 Kt£ (r 1 1 / 2 ) 

and 

||(1 - l n )E(i/ - 1\B k +i)\\l°° = OkM 12 )- 
If we then set Qr+i '■= U Q, then we can verify ( 18. 13ft and ( 18.141) from ( 18.61) and 

(EZD. 

It remains to verify (I8.15p . the energy increment property, that is to say the statement 

||(1 - ln K+i mf\B K+1 )\\ 2 L2{ZN) > ||(1 - lnMflBJWhfr) + (8.18) 

To do this we exploit the hypothesis ||-FR- + i||[/fc-i > e 1 ^ 2 , which was ( 18. 9ft . By Lemma 
16.11 and the definition ( 18.81) we have 

| ((1 - ln K )(f-nf\B K )),VF K+1 ) | = \(F K+1 ,VF K+ i)\ = H^+ill^i > ^ ■ 

On the other hand, from the bounds ( 18. 41) . ( 18. 6ft and ( 18. 17ft we have 

| ((1 0jc+1 - 1 0jc )(/ - E(/|Bk)),PF* +1 ) | 

*C \\DF K +xUE((\n K+x - l n J|/ - E(/|B*)|) 

= JCfB (l)E((l njr+1 -l njr )(i/ + l)) = 0^(i7 1/a ), 

while from (17.11) and (18.101) we have 

| <(1 - ln K+1 )(f-E(f\B K )),VF K+1 -E(VF K+1 \B K+1 )) \ 

^ ||PF K+1 -E(PF K+1 |^ +1 )||ocE((l - ln K+1 )\f-E(f\B K )\) 

<0(e)E((l-l nic+1 )(i/+l)) = 0(e). 

By the triangle inequality we thus have 

| ((1 - ln K+l )(f - EU\B K )),E{VF K+1 \B K+l )) \ > e 1 ' 2 - K M /2 ) ~ 0(e). 

But since (1 — 1q k+1 ), E(VFk+i\Bk+i), and E(/\Bk) are all measurable in Bk+i, we 
can replace / by E(/\Bk+i), and so 

| <(1 - ln K+1 )(E(f\B K+1 ) - E(f\B K )),E(VF K+1 \B K+1 )) \ > e 1 ' 2 - K M' 2 ) - 0(e). 
By the Cauchy-Schwarz inequality and (18.171) we obtain 
|| (1 - l QK+1 )(E(f\B K+1 ) - E(f\B K ))\\ L ^ N) > 2' 2k - 1+l e 1 ' 2 - K , £ (v 1/2 ) - 0(e). (8.19) 
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Morally speaking, this implies (18.151) thanks to Pythagoras's theorem, but the presence 
of the exceptional sets fix and Qr+i means that we have to exercise caution, especially 
since we have no L 2 control on v. 

Recalling that E(f\B K ) < 1 + £) icO? 1/2 ) outside Q K (cf. ( KWf ). we observe that if 
i] < ?7o(e, K) is sufficiently small then 

||(l 0jc+1 - ln K mf\B K )\\l HZN) ^ 2\\la K+1 - la K \\l < 2||l 0jf+1 - lnji ^ 2El njf+1 , 

which, by (18.61) . is Ox^iv 1 ^ 2 )- By the triangle inequality (and (18.10!) ) we thus see that 
to prove (18.151) it will suffice to prove that 

||(l-l^ +1 )E(/|^ +1 )||i 2(Zjv) > ||(l-l^ + jE(/|^)||i 2( ^ ) +2- 2fe+2 ,-0^(^ ) _ (5 3 / 2 ), 

since we can absorb the error terms — Ox^iv 1 ^ 2 ) ~~ 0(e 3 ^ 2 ) into the 2~ 2k+2 e term by 
choosing e sufficiently small depending on k, and 7] sufficiently small depending on K, e. 

We write the left-hand side as 

||(1 - l nK+1 )E(f\B K ) + (1 - ln K+1 )(E(f\B K+1 )-E(f\B K )\\l 2{ZN) 
which can be expanded using the cosine rule as 

||(1 - l^ +1 )E(/|^)|| 2 2(Zjv) + ||(1 - l 0jr+l )(E(/|B* +1 ) - E(f\B K ))\\ 2 LHZN) 

+ 2 <(1 - 1 Qk+1 )E( f\B K ), (1 - l nii+1 )(E(f\B K+1 ) - E(f\B K ))) . 

Therefore by (I8.19P it will suffice to show the approximate orthogonality relationship 

<(1 - l nK+1 )E(f\B K ), (1 - l nK+1 )(E(f\B K+1 ) - E(f\B K ))) = O^ 2 ). 

Since (1 — ln K+1 ) 2 = (1 — 1q k+1 ), this can be rewritten as 

<(1 - l nK+1 )E(f\B K ),E(WK + i) - E(f\B K )) . 

Now note that (1 — ln K )E(/|i3x) is measurable with respect to Bk, and hence orthogonal 
to E(/|£>x+i)— E(/\Bk), since Bk is a sub-cr-algebra of Bk+i- Thus the above expression 
can be rewritten as 

)E(f\B K ),E(f\B K+ i) - E(f\B K )) . 

Again, since the left-hand side is measurable with respect to Bk+i, we can rewrite this 
as 

((ln K+1 -ln K )E(f\B K )J-E(f\B K )}. 

Since E(f\B K )(x) ^ 2 if r/ < r} (e,K) is sufficiently small and x £ VL K (cf. (18.101) ). we 
may majorise this by 

2E((l 0jc+1 -l njr )|/-E(/|B JC )|). 

Since we are working on the assumption that ^ f{x) ^ v{x), we can bound this in 
turn by 

2E((1 )(u + E(u\B K ))). 

Since E(z/|i3^)(x) ^ 2 for x ^ VLk (cf- (18. 7p ) this is no more than 

4E((ln x+1 -ln x )^ + l)), 

which is Ox^iv 1 ^ 2 ) as desired by (18. 6p . This concludes the proof of Proposition 18.21 and 
hence Theorem 13.51 □ 
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9. A PSEUDORANDOM MEASURE WHICH MAJORISES THE PRIMES 



Having concluded the proof of Theorem 13.51 we are now ready to apply it to the specific 
situation of locating arithmetic progressions in the primes. As in almost any additive 
problem involving the primes, we begin by considering the von Mangoldt function A 
defined by A(n) = logp if n = p m and otherwise. Actually, for us the higher prime 
powers p 2 ,p 3 , . . . will play no role whatsoever and will be discarded very shortly. 

From the prime number theorem we know that the average value of A(n) is 1 + o(l). 
In order to prove Theorem 11.11 (or Theorem II. 2\\ , it would suffice to exhibit a measure 
v : Zat — > M. + such that v(n) ^ c(k)A(n) for some c(k) > depending only on k, 
and which is fc-pseudorandom. Unfortunately, such a measure cannot exist because the 
primes (and the von Mangoldt function) are concentrated on certain residue classes. 
Specifically, for any integer q > 1, A is only non-zero on those <fi(q) residue classes 
a(modg) for which (a, q) = 1, whereas a pseudorandom measure can easily be shown 
to be uniformly distributed across all q residue classes; here of course 4>(q) is the Euler 
totient function. Since 4>(q)/q can be made arbitrarily small, we therefore cannot hope 
to obtain a pseudorandom majorant with the desired property u(n) ^ c(k)A(n). 

To get around this difficulty we employ a device which we call the W-triclfl which 
effectively removes the arithmetic obstructions to pseudorandomness arising from the 
very small primes. Let w = w(N) be any function tending slowly^ to infinity with N, 
so that l/w(N) = o(l), and let W = Yl P ^w{N)P ^ e the P r °duct of the primes up to 
w(N). Define the modified von Mangoldt function A : Z + — > M + by 



Note that we have discarded the contribution of the prime powers since we ultimately 
wish to count arithmetic progressions in the primes themselves. This VF-trick exploits 
the trivial observation that in order to obtain arithmetic progressions in the primes, 
it suffices to do so in the modified primes {n e Z : Wn + 1 is prime} (at the cost of 
reducing the number of such progressions by a polynomial factor in W at worst). We 
also remark that one could replace Wn + 1 here by Wn + b for any integer 1 ^ b < W 
coprime to W without affecting the arguments which follow. 

Observe that if w(N) is sufficiently slowly growing (w(N) -C log log N will suffice here) 
then by Dirichlet's theorem concerning the distribution of the primes in arithmetic 



The reader will observe some similarity between this trick and the use of c-algebras in the previous 
section to remove non-Gowers-uniformity from the system. Here, of course, the precise obstruction to 
non-Gowers-uniformity in the primes is very explicit, whereas the exact structure of the cr-algebras 
constructed in the previous section are somewhat mysterious. In the specific case of the primes, we 
expect (through such conjectures as the Hardy-Littlewood prime tuple conjecture) that the primes 
are essentially uniform once the obstructions from small primes are removed, and hence the algorithm 
of the previous section should in fact terminate immediately at the K = iteration. However we 
emphasise that our argument does not give (or require) any progress on this very difficult prime tuple 
conjecture, as we allow K to be non-zero. 

20 Actually, it will be clear at the end of the proof that we can in fact take w to be a sufficiently 
large number independent of N, depending only on k, however it will be convenient for now to make 
w slowly growing in N in order to take advantage of the o(l) notation. 




when Wn + 1 is prime 
otherwise. 
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progressions^] such as {n : n = l(modW / )} we have J2 n <N -M n ) — N(l + o(l)). With 
this modification, we can now majorise the primes by a pseudorandom measure as 
follows: 

Proposition 9.1. Write e k := l/2 k (k + 4)! ; and let N be a sufficiently large prime 
number. Then there is a k -pseudorandom measure v : Zjv — > M + such that v(n) ^ 
fe- 1 2- fc - B A(n) for all e k N < n < 2e fc A. 

Remark. The purpose of is to assist in dealing with wraparound issues, which arise 
from the fact that we are working on Zjv and not on [—N,N]. Standard sieve theory 
techniques (in particular the "fundamental lemma of sieve theory") can come very close 
to providing such a majorant, but the error terms on the pseudorandomness are not of 
the form o(l) but rather something like 0(2~ 2 ) or so. This unfortunately does not 
quite seem to be good enough for our argument, which crucially relies on o(l) type 
decay, and so we have to rely instead of recent arguments of Goldston and Yildirim. 
Proof of Theorem \l.l\ assuming Proposition ^. 1\ Let A be a large prime number. Define 
the function / e L x (Zjv) by setting f(n) := fc- 1 2 _ *- s A(7i) for e k N ^ n ^ 2e k N and 
f(n) = otherwise. From Dirichlet's theorem we observe that 

E(/) = ^^ AH = rt- fc - 5 e fc (l + o(l)). 

We now apply Proposition 19.11 and Theorem 13.51 to conclude that 

E(f(x)f(x + r) . . . f(x + (k - l)r) | x, r G Z N ) ^ c{k, k~ 1 2~ k ~ 5 e k ) - o(l). 

Observe that the degenerate case r = can only contribute at most 0(jv log fc N) = o(l) 
to the left-hand side and can thus be discarded. Furthermore, every progression counted 
by the expression on the left is not just a progression in Z^r, but a genuine arithmetic 
progression of integers since e k < 1/k. Since the right-hand side is positive (and bounded 
away from zero) for sufficiently large N, the claim follows from the definition of / and 
A. □ 

Thus to obtain arbitrarily long arithmetic progressions in the primes, it will suffice to 
prove Proposition 19.11 This will be the purpose of the remainder of this section (with 
certain number-theoretic computations being deferred to ^TUl and the Appendix). 

To obtain a majorant for A(n), we begin with the well-known formula 

H n ) = ^2^) log(n/d) =^2fi(d) \og(n/d) + 

d\n d\n 

for the von Mangoldt function, where \i is the Mobius function, and log(a;) + denotes 
the positive part of the logarithm, that is to say max(log(a;), 0). Here and in the sequel 
d is always understood to be a positive integer. Motivated by this, we define 



21 In fact, all we need is that X)iv<n<2Ar ^( n ) N. Thus one could avoid appealing to the theory of 
Dirichlct L- functions by replacing n = l(modVF) by n = 6(modW / ), for some b coprime to W chosen 
using the pigeonhole principle. 
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Definition 9.2 (Goldston-Yildirim truncated divisor sum). Let R be a parameter (in 
applications it will be a small power of N). Define 

Afl(n) := \og(R/d) = Y^t*(d) log(R/d)+. 

d\n d\n 
d^R 

These truncated divisor sums have been studied in several papers, most notably the 
works of Goldston and Yildirim [151 IT6l [T7] concerning the problem of finding small 
gaps between primes. We shall use a modification of their arguments for obtaining 
asymptotics for these truncated primes to prove that the measure v defined below is 
pseudorandom. 

Definition 9.3. Let R := N k ~ l2 ~ k ~\ and let e k := l/2 k (k + 4)\. We define the function 

v : Z N -»• R+ by 

:= I ^^S^ wta > < » < 2 ^ 

) 1 otherwise 

for all ^ n < N, where we identify {0, . . . , N — 1} with Zjv in the usual manner. 

This v will be our majorant for Proposition 19.11 We first verify that it is indeed a 
major ant. 

Lemma 9.4. v(n) ^ for all n G TL^, and furthermore we have u(n) ^ fc _1 2 ~ 5 A(n) 
for all 6f.N ^ n ^ 2efcA^ (z/iV «s sufficiently large depending on k). 

Proof. The first claim is trivial. The second claim is also trivial unless Wn + 1 is 
prime. From definition of R, we see that Wn + 1 > R if N is sufficiently large. Then 
the sum over d\Wn + 1, d ^ i? in ( 19.21) in fact consists of just the one term d — 1. 
Therefore A^(W / n + 1) = logi?, which means that v(n) = ^plogi? ^ /c -1 2~ fc ~ 5 A(n) 
by construction of R and N (assuming w(N) sufficiently slowly growing in N). □ 

We will have to wait a while to show that v is actually a measure (i.e. it verifies f)2.4p ). 
The next proposition will be crucial in showing that v has the linear forms property. 

Proposition 9.5 ( Goldston- Yildirim). Let m,t be positive integers. For each 1 ^ i ^ 
m, let := Y^j=i ^ij x j + bi> ^ e linear forms with integer coefficients Lij such that 

\Lij\ ^ \Jw(N)/2 for all i = l,...m and j = 1, . . . ,t. We assume that the t-tuples 
(LijYj =l are never identically zero, and that no two t-tuples are rational multiples of each 

other. Write 6{ := Wipi + 1. Suppose that B is a product n!=i li C M* oft intervals 
Ii, each of which having length at least R 10m . Then (if the function w(N) is sufficiently 
slowly growing in N) 

W\ogR s ' 



E(A R (^(x)) 2 . . . A^ m (x)r |x EB) = (1 + o m , t (l)) 



cj>(W) 



Remarks. We have attributed this proposition to Goldston and Yildirim, because it 
is a straightforward generalisation of [TT1 Proposition 2]. The H^-trick makes much of 
the analysis of the so-called singular series (which is essentially just (W/<j)(W)) m here) 
easier in our case, but to compensate we have the slight extra difficulty of dealing with 
forms in several variables. 



38 BEN GREEN AND TERENCE TAO 

To keep this paper as self-contained as possible, we give a proof of Proposition 19.51 In 
CrTH the reader will find a proof which depends on an estimation of a certain contour 
integral involving the Riemann ^-function. This is along the lines of [T71 Proposition 
2] but somewhat different in detail. The aforementioned integral is precisely the same 
as one that Goldston and Yildinm find an asymptotic for. We recall their argument in 
the Appendix. 

Much the same remarks apply to the next proposition, which will be of extreme utility 
in demonstrating that v has the correlation property (Definition 13 . 2f) . 

Proposition 9.6 ( Goldston- Yildinm). Letm ^ 1 be an integer, and let B be an interval 
of length at least R Wm . Suppose that hi, . . . , h m are distinct integers satisfying \hi\ ^ iV 2 
for all 1 ^ i ^ m, and let A denote the integer 

Then (for N sufficiently large depending on m, and assuming the function w(N) suffi- 
ciently slowly growing in N) 

E(A R (W(x + h) + 1) 2 ... A R (W(x + h m ) + l) 2 \x e B) 
Here and in the sequel, p is always understood to be prime. 

Assuming both Proposition 19.51 and Proposition 19.61 we can now conclude the proof of 
Proposition 19.11 We begin by showing that v is indeed a measure. 

Lemma 9.7. The measure v constructed in Definition \9.3\ obeys the estimate E(z/) = 

Proof. Apply Proposition 19.51 with m := t := 1, ipi(xi) := X\ and B := [e^N, 2efciV] 
(taking iV sufficiently large depending on k, of course). Comparing with Definition 19.31 
we thus have 

E(u(x) | x G [e k N,2e k N]) = l + o(l). 
But from the same definition we clearly have 

E(v(x) | x G Z N \[e k N,2e k N}) = 1; 
Combining these two results confirms the lemma. □ 

Now we verify the linear forms condition, which is proven in a similar spirit to the above 
lemma. 

Proposition 9.8. The function v satisfies the (k-2 k ~ l , 3k — 4, k) -linear forms condition. 

Proof. Let if>i(x) = Y^j=i LijXj+bi be linear forms of the type which feature in Definition 
13.11 That is to say, we have m ^ k ■ 2 k ~ 1 , t ^ 3k — 4, the Lij are rational numbers 
with numerator and denominator at most k in absolute value, and none of the t-tuples 
(LijYj =l is zero or is equal to a rational multiple of any other. We wish to show that 

EKV>i (x)) . . . KV>m(*)) I x e Z%) = 1 + o(l). (9.2) 
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We may clear denominators and assume that all the L y - are integers, at the expense of 
increasing the bound on Ly to |Ly| ^ (k + 1)!. Since w(N) is growing to infinity in 
N, we may assume that (k + 1)! < ^/w(N)/2 by taking A" sufficiently large. This is 
required in order to apply Proposition 19.51 as we have stated it. 

The two-piece definition of v in Definition 19.31 means that we cannot apply Proposition 
19.51 immediately, and we need the following localization argument. 

We chop the range of summation in (19. 2\\ into Q l almost equal-sized boxes, where 
Q = Q(N) is a slowly growing function of N to be chosen later. Thus let 

B ui ,..., ut = {x G Z% : Xj G [[ujN/Q], l( Uj + 1)N/Q\),j = 1, . . . 

where the Uj are to be considered (modQ). Observe that up to negligible multiplicative 
errors of 1 + o(l) (arising because the boxes do not quite have equal sizes) the left-hand 
side of (19. 2D can be rewritten as 

E(E(i/(V>i(x)) . . . z/(^ m (x))|x G B Ul ^ Ut )\ui, ...,u t e Z Q ). 

Call a t-tuple (ui, . . . , u t ) G Zq nice if for every 1 ^ i ^ m, the sets ipi(B Ul! _ )Ut ) are 
either completely contained in the interval [e^A^, 2ekN] or are completely disjoint from 
this interval. From Proposition 19.51 and Definition 19.31 we observe that 

ui,...,ut) J- i Om,t (1) 

whenever (u\, . . . , u t ) is nice, since we can replaceEl each of the z/(-0j(x)) factors by either 
w log R -^kfti ( x ) ) or ^' an d ^/Q wm exceed i? 10m for Q sufficiently slowly growing in N, 
by definition of R and the upper bound on m. When (ui, . . . , u t ) is not nice, then we 
can crudely bound v by 1 + ^^^ A^fl^x)), multiply out, and apply Proposition 19.51 
again to obtain 

E(i/(V>i(x)) . . . z/(^ m (x))|x G S Ul) ... iUt ) = O m>t (l) + o OT)t (l) 

We shall shortly show that the proportion of non-nice t-tuples (ui, . . . ,u t ) in Zq is at 
most O m) t(l/Q), and thus the left-hand side of (19.21) is l + o TO) t(l) +O m> t(l/Q), and the 
claim follows by choosing Q sufficiently slowly growing in N. 

It remains to verify the claim about the proportion of non-nice t-tuples. Suppose 
(u±, . . . , Ut) is not nice. Then there exists 1 ^ % ^ m and x, x' G B Uu _ jUt such that ^(x) 
lies in the interval [ekN,2ekN], but ipi(~x!) does not. But from definition of 5 Ulr .. )Ut (and 
the boundedness of the Ly ) we have 

i 

^(x), ^(x) = ^ L y LiV^/QJ + + O mjt (N/Q). 

3=1 



There is a technical issue here due to the failure of the quotient map Z — » Zat to be a bijection. 
More specifically, the functions , 0i( x ) onr y take values in the interval [e^N, 2t^N] modulo N, and so 
strictly speaking one needs to subtract a multiple of N from tpi in the formula below. However, 
because of the relatively small dimensions of the box -B Ml) ..., Ut , the multiple of N one needs to subtract 
is independent of x, and so it can be absorbed into the constant term hi of the affine-linear form tpi 
and thus be harmless. 
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Thus we must have 

t 

ae k N = Lij \Nuj/Q\ + h + O m , t {N/Q) 

3=1 

for either a = 1 or a = 2. Dividing by N/Q, we obtain 
t 

) J LijUj = ae k Q + hQ/N + O m . t {l) (modQ). 

3=1 

Since (Ly)*- =1 is non-zero, the number of t-tuples («i, . . . , u t ) which satisfy this equation 
is at most O m)4 (Q i_1 ). Letting a and i vary we thus see that the proportion of non-nice 
^-tuples is at most O m>t (l/Q) as desired (the m and t dependence is irrelevant since 
both are functions of k). □ 

In a short while we will use Proposition 19.61 to show that v satisfies the correlation 
condition (Definition 13.21) . Prior to that, however, we must look at the average size of 
the "arithmetic" factor Up^i 1 + °m{p~ l/2 )) appearing in that proposition. 

Lemma 9.9. Let m ^ 1 be a parameter. There is a weight function r = r m : Z — > M + 
such that r{n) ^ 1 for all n^O, and such that for all distinct h\, . . . ,hj G [e k N, 2e k N] 
we have 

11(1 + O m (p- 1/2 )) < J2 rfc-hs), 

p|A l^i<j^m 

where A is defined in Proposition ^. b\ and such that E(r 9 (n)|0 < |n| ^ N) = O m>q (l) 
for all < q < oo. 

Proof. We observe that 

/ \ O m (l) 

na+o m (p- i/2 ))^ n n a+^ i/2 ) ■ 

p\A l<i<j<m \p\hi-hj ' 

By the arithmetic mean-geometric mean inequality (absorbing all constants into the 
O m (l) factor) we can thus take r m (n) := O m (l) U P \ n ( l + p~ 1/2 )° mW for all n ^ 0. (The 
value of t at is irrelevant for this lemma since we are taking all the hi to be distinct). 
To prove the claim, it thus suffices to show that 

E( JJ(1 +p- 1 /2)O m (o) o < \n\ ^ iV J = O m , ff (l) for all < q < oo. 

P\n ' 

Since (1 + p~ 1 / 2 )° m ('?) is bounded by 1 + p~ 1 ^ for all but O m>? (l) many primes p, we 
have 



^(Y[(l+P~ 1/2 )° m{q) < |n| ^ n\ ^ O m , q (l)¥,(\\(l+p- l/A ' 

p\n p\n 



0<n^N). 
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But n p |„(l +P~ 1/4 ) < E,|„^ 1/4 , and hence 

^ p\n ' l<|n|s£iV d\n 

1 - N 
< o (I)— V— d~ l/A 

d=i 

which is O mi9 (l) as desired. □ 
We are now ready to verify the correlation condition. 

Proposition 9.10. The measure v satisfies the 2 k ~ l -correlation condition. 

Proof. Let us begin by recalling what it is we wish to prove. For any 1 ^ m ^ 2 k ~ 1 and 
h\, ... , h m G Ztv we must show a bound 

E(u(x + hi)v{x + h 2 ) ■ ■ .v(x + h m ) \ x G Z N ) < r(hi-hj), (9.3) 

where the weight function r = r m is bounded in L q for all q. 

Fix m, hi, ... , h m . We shall take the weight function constructed in Lemma [9.91 (iden- 
tifying Z^v with the integers between —N/2 and +N/2), and set 

r(0) := exp(Cmlog iV/ log log A/") 

for some large absolute constant C. From the previous lemma we see that E(r 9 ) = 
O m>g (l) for all g, since the addition of the weight r(0) at only contributes o m (J (l) at 
most. 

We first dispose of the easy case when at least two of the h{ are equal. In this case 
we bound the left-hand side of (19. 2p crudely by But from Definitions 19.21 19.31 

and by standard estimates for the maximal order of the divisor function d(n) we have 
the crude bound <C exp(Clog NJ log log N), and the claim follows thanks to our 

choice of r(0). 

Suppose then that the h t are distinct. Since, in (19.31) . our aim is only to get an up- 
per bound, there is no need to subdivide Z^ into intervals as we did in the proof of 
Proposition 19.81 Write 

9{n) := w \ gR W^(n). 

Then by construction of v (Definition 19.31) . we have 

E(u(x + hi) . . . v{x + h rn ) | x G Zjv) 

^E((l + g(x + hi))...(l + g(x + h m )) \xeZ N ). 
The right-hand side may be rewritten as 



Efjlgix + hi) 

{A ™1 \ Aez A 



AC{l,...,m} y ieA 



x G Z 



N 
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(cf. the proof of Lemma 13.41) . Observe that for i,j G A we may assume \hi — hj\ ^ 
6kN, since the expectation vanishes otherwise. By Proposition 19.61 and Lemma 19.9} we 
therefore have 



Efflgix + hi) x G ^ U T ( h 



h ; 



o m {l). 



Summing over all A, and adjusting the weights r by a bounded factor (depending only 
on m and hence on k), we obtain the result. □ 

Proof of Proposition \9.1[ This is immediate from Lemma 19.41 Lemma 19.71 Proposition 
I9.8[ Proposition 19 .101 and the definition of /c-pseudorandom measure, which is Definition 
E3J □ 



10. Correlation estimates for A r 

To conclude the proof of Theorem 11.11 it remains to verify Propositions 19.51 and 19.61 
That will be achieved in this section, assuming an estimate (Lemma 110.41) for a certain 
class of contour integrals involving the (^-function. The proof of that estimate is given 
in the preprint [IT] , and will be repeated in the Appendix for sake of completeness. The 
techniques of this section are also rather close to those in [T7] . We are greatly indebted 
to Dan Goldston for sharing this preprint with us. 

The linear forms condition for Kr. We begin by proving Proposition l9.5[ Recall that for 
each 1 ^ i ^ m we have a linear form ?/>i(x) = Ylj=i LijXj + h in t variables x±,...,Xt- 
The coefficients Ly satisfy |L y -| ^ ^w(N)/2, where w(N) is the function, tending to 
infinity with N, which we used to set up the W^-trick. We assume that none of the 
t-tuples {Lij)^ =1 are zero or are rational multiples of any other. Define di := Wipi + 1- 

Let B := Ylj=i h be a product of intervals Ij, each of length at least R 10m . We wish to 
prove the estimate 



E(A R (^(x)) 2 . . . A^ m (x)) 2 | x G B) = (1 + o w>t (l)) ( 



WlogR 



The first step is to eliminate the role of the box B. We can use Definition 19. 21 to expand 
the left-hand side as 



/ m R R 

E (n e M^)M<)iogT io s^ 



which we can rearrange as 



xGB . (10.1) 



/ m R R\ ( m 

d 1 ,...,d m ,d' 1 ,-,d' m ^R * =1 * i=1 

Because of the presence of the Mobius functions we may assume that all the di, d[ are 
square-free. Write D := [d\, . . . , d m , d[, . . . , d' m ] to be the least common multiple of the 
di and d[, thus D ^ R 2m . Observe that the expression \\™ =l ld i ,d'.|6» i (x) is periodic with 
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period D in each of the components of x, and can thus can be safely defined on 7} D . 
Since B is a product of intervals of length at least R Wm , we thus see that 



E ni 



<2»,<|0i(x) 



i=l 



ill' 



<k,d'\6i(x) 



i=l 



xeZU+O m , 4 (iT 8m ). 



The contribution of the error term O m (R 8m ) to (110.11) can be crudely estimated by 
O m ,t(R~ 6m l°g 2m R): which is easily acceptable. Our task is thus to show that 



E 

di,...,dm,d' lt —,d' m ^R X «=1 



m R R' 

II M d «)M4) log — log — ) E ( ] J , x , 



(l + o m>t (l)) 



di d\ 
WhgR 



i=i 



x G Z*, 



D 



10.2) 



To prove (110. 2ft . we shall perform a number of standard manipulations (as in [TT] ) to 
rewrite the left-hand side as a contour integral of an Euler product, which in turn can 
be rewritten in terms of the Riemann ^-function and some other simple factors. We 
begin by using the Chinese remainder theorem (and the square-free nature of di, gQ to 
rewrite 



E (ni 



di,d'i\0i(^) 



1=1 



XGZ^j=JlEf Y\_ 1 0i(x)sO(modp) 
' p\D \i-.p\did\ 



x G Z 



Note that the restriction that p divides D can be dropped since the multiplicand is 1 
otherwise. In particular, if we write X dl dm(p) '■= {1 ^ i ^ m : p\di} and 



(p) :=e([]1, 



(x)=0(modp) 



x G ii 



(10.3) 



for each subset X C {1, . . . , m}, then we have 



x G Z f 7 



n 



We can thus write the left-hand side of (I10.2p as 

E 



UJ X , 



d 1 ,...,d m (,p) L>x d' 1 , 



» 



(p). 



jj/i(c/ l ) / i«)(iog-) + (iog-) + j dm (p)ux d ,,...^( P )(p)- 

di,...,d m ,d' 1 ,...,^ez+ x i=i 1 1 ' p 

To proceed further, we need to express the logarithms in terms of multiplicative func- 
tions of the di, d\. To this end, we introduce the vertical line contour Y\ parameterised 
by 

T x {t) := +it; -oo < t < +oo (10.4) 



log.R 

and observe the contour integration identity 



1 

2tH 



x 



dz = (logx) 



valid for any real x > 0. The choice of j^-^ for the real part of T\ is not currently 
relevant, but will be convenient later when we estimate the contour integrals that emerge 
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(in particular, R z is bounded on r 1; while 1/z 2 is not too large). Using this identity, 
we can rewrite the left-hand side of (110.21) as 

rr m p? z j +z 'j 

(2m)~ 2m ... F(z,J)T\— rw dz i d2>j (10.5) 

JTi JYi j=1 z j z j 

where there are 2m contour integrations in the variables z\, . . . , z m , z[, . . . , z' m on r 1; 
z := {zx, ...,z m ) and z' := (z[, . . . , z' m ), and 

E (h^fjf)]!^ <.«M- 

di,...,d m ,d' 1 ,...,d' m ez+ v i=i dj dj / P 

We have changed the indices from i to j to avoid conflict with the square root of —1. Ob- 
serve that the summand in d 1 . 6 1) is a multiplicative function of D = [d\, . . . , d m , d[, . . . , d' n 
and thus we have (formally, at least) the Euler product representation F(z, z 1 ) = 
H p E p (z,z!), where 

^ M:= E (z^WM . (107) 

^— ' ^Z^jcX z ]+2^j£X' z j 

X,X'C{l,...,m} F 

From fl!0.3p we have uj$(p) = 1 and uixip) ^ 1> an d so E p (z,z') = 1 + O a {l/p a ) when 
3?(zj), 3?(^) > a (we obtain more precise estimates below). Thus this Euler product is 
absolutely convergent to F(z,z') in the domain {^(z.,), 9?(z^) > 1} at least. 

To proceed further we need to exploit the hypothesis that the linear parts of ipi, ■ ■ ■ , ipm 
are non-zero and not rational multiples of each other. This shall be done via the 
following elementary estimates on ux{p)- 

Lemma 10.1 (Local factor estimate). If p ^ w(N), then ujx{p) = for all non-empty 
X; in particular, E p = 1 when p ^ w(N). If instead p > w(N), then 0Jx{p) = P~ X when 
\X\ = 1 and oox{p) ^ P 2 when \X\ ^ 2. 

Proof. The first statement is clear, since the maps 8j : Z* — > Z p are identically 1 when 
p ^ w(N). The second statement (when p > w(N) and \X\ = 1) is similar since in this 
case 9j uniformly covers Z p . Now suppose p > w(N) and \X\ = 2. We claim that none 
of the s pure linear forms W(ipi — bP) is a multiple of any other (modp). Indeed, if this 
were so then we should have LijLpj = A(modp) for some A, and for all j — 1, . . . , t. But 

if a/q and a' /q' are two rational numbers in lowest terms, with \a\, \a'\, q, q' < ^w(N)/2, 
then clearly a/q ^ a' / q' (mod p) unless a = a', q = q' . It follows that the two pure linear 
forms ipi — hi and ipi' — W are rational multiples of one another, contrary to assumption. 
Thus the set of x G (Z/pZ)* for which #j(x) = 0(modp) for all i G X is contained in 
the intersection of two skew affine subspaces of (Z/pZ)*, and as such has cardinality at 
most p*~ 2 . □ 

This lemma implies, comparing with (110. 7p . that 

m 

E p (z,z') = 1 - IpmoW ^(p- 1 -^ +p- 1 -^-p- 1 - z ^) 
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+ i p>w(N) J2 pE^UEW (ia8) 

X,X'C{l,...,m} F 
\XUX'\^2 

where the 0{l/p 2 ) numerator does not depend on z, z' . To take advantage of this 
expansion, we factorise E p = E^p E^ 1 Ep 3 \ where 

E p (z,z') 



0(l/p 2 



eW( z z ') ■= 

m 

Ef\z,z') : = " lp^wp- 1 -*)-^ ~ W(JV)P" H ) _1 (1 " W^" 1 "*'"* 5 ) 

3=1 
m 

j'=i 

Writing := [7 p #p } for j = 1, 2, 3, one thus has F = GiG 2 G 3 (at least for 9t( Zj ), 3fc(^.) 
sufficiently large). If we introduce the Riemann (^-function £( s ) : = f] p (l — then 
we have 

^^n«rOT^ (10 - 9 > 

so in particular G3 can be continued meromorphically to all of C 2m . As for the other 
two factors, we have the following estimates which allow us to continue these factors a 
little bit to the left of the imaginary axes. 

Definition 10.2. For any a > 0, let V™ C C 2m denote the domain 

V? := { Zj , z) : -a < M( Zj ), < 100, j = 1, . . . , m}. 

If G = G(z, z') is an analytic function of 2m complex variables on V™, we define the 
C k (V™) norm of G for any integer k ^ as 

\\G\\ CHv „y.= sup II (^f\..(±-)^(± r f...(^ r Y-G\\ La , m 

a 1 ,...,a m ,a' 1 ,...,a' m UZ\ OZ m OZ x OZ m V „ ) 

where ai, . . . , a m , a[, . . . , a' m range over all non-negative integers with total sum at most 
k. 

Lemma 10.3. The Euler products \\ p Ep^ for j = 1,2 are absolutely convergent in 
the domain F>™ 6m - In particular, G\, G 2 can be continued analytically to this domain. 
Furthermore, we have the estimates 

||Gi||c-(x>- /6m ) < O m (l) 

\\G 2 \\c™(v™ 6 j < O mtW ( N )(l) 

G 1 (0,0) = l + o m (l) 

G 2 (0,0) = (WW) m 

Remark. The choice a = l/6m is of course not best possible, but in fact any small 
positive quantity depending on m would suffice for our argument here. The dependence 
of O mjW (jv)(l) on w(N) is not important, but one can easily obtain (for instance) growth 
bounds of the form W (N)°™( W ( N ». 
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Proof. First consider j — 1. From (110.81) and Taylor expansion we have the crude bound 
Ep 1 \z,z') = 1 + O m (p~ 2+4 / 6m ) in T>y 6m , which gives the desired convergence and also 
the C m (V™ &m ) bound on G\\ the estmate for Gi(0, 0) also follows since the Euler factors 

, z') are identically 1 when p ^ w(N). The bound for G 2 are easy since this is just 
a finite Euler product involving at most w(N) terms; the formula for £2(0,0) follows 
from direct calculation since = rip<w(n.)(l ~~ \)- ^ 

To estimate (110.51) . we now invoke the following contour integration lemma. 

Lemma 10.4. [17J Let R be a positive real number. Let G = G(z,z') be an analytic 
function of 2m complex variables on the domain T>™ for some a > 0, and suppose that 

\\G\\ C m m = exp(O m , CT (log 1/3 R)). (10.10) 
C(l + Zj + z' 3 ) R z i+ Z 'i 



Then 



(2tt»)*» k ' ' ' k 1 ; JJ [ C(i + %)C(i + 4) 



,-2-rl dZjdZj 



= G(0, . . . , 0) log m R + J2 OmAWGWcHv?) ^og m ^ R) + O^e" 5 ^) 
for some 5 = 5(m) > 0. 

Proof. While this lemma is essentially in [17] . we shall give a complete proof in the 
Appendix for sake of completeness. □ 

We apply this lemma with G := G1G2 and o := l/6m. From Lemma 1 10. 31 and the 
Leibnitz rule we have the bounds 

||<3|| C i(£>™.j ^ Oi,m,w(JV)(l) for all ^ j ^ m, 

and in particular we obtain (110. 10D by choosing w(N) to grow sufficiently slowly in N. 
Also we have G(0, 0) = (l + o m (l))(^7y) m from that lemma. We conclude (again taking 

w(N) sufficiently slowly growing in N) that the quantity in (110.51) is (l+o m (l))( t ^ ( 1 ^ ) R ) m , 
as desired. This concludes the proof of Proposition 19.51 □ 

Higher order correlations for A#. We now prove Proposition 19. 6[ using arguments very 
similar to those used to prove Proposition 19.51 The main differences here are that the 
number of variables t is just equal to 1, but on the other hand all the linear forms are 
equal to each other, ipi{x\) = x\. In particular, these linear forms are now rational 
multiples of each other and so Lemma 110.11 no longer applies. However, the arguments 
before that Lemma are still valid; thus we can still write the left-hand side of (19.11) as 
an expression of the form (110.51) plus an acceptable error, where F is again defined by 
(110.61) and E p is defined by (110. 7ft ; the difference now is that oJ X {p) is the quantity 



UX\P) '■— 1 1 i-W(x+hi)+l=0(modp) 



x eZ p ). 



:(p):=E(nii 

Again we have uj%{p) = 1 for all p. The analogue of Lemma [10.11 is as follows. 
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Lemma 10.5. If p ^ w(N), then uj x {p) = for all non-empty X; in particular, 
E p = 1 when p ^ w(N). If instead p > w(N), then uJxip) — V~ X when \X\ = 1 and 
^xip) ^ V~ X when \X\ ^ 2. Furthermore, if \X\ ^ 2 then wx(p) = unless p divides 
A: = Ui^<j<:s\ h i- h j\- 

Proof. When p ^ w(N) then VT(rr + /ij) + 1 = l(modp) and the claim follows. When 
p > w(N) and \X\ ^ 1, ux(p) is equal to 1/p when the residue classes {/ij(modp) : i G 
X} are all equal, and zero otherwise, and the claim again follows. □ 
In light of this lemma, the analogue of (110.81) is now 



E p {z,z') = 1 - l P>w {N)^{p' l ' Z] +P~ 1 ~ z 'i -p- l - z i- z 'j) + 1 p>w{n) , p \aK{z, z') (10.11) 
where X p (z, z') is an expression of the form 

o(i/ P ) 

n\ 1 

\XUX'\^2 



and the 0(l/p) quantities do not depend on z, z' . We can thus factorise 



E P = E^E^E^Ef\ 



where 



Ep ^ = 1 + 1 p>w (n), p \aX p ( z i z ') 



Ep 



m 



Write G?j- = IIp-Ep Then ; as before, F = G GiG 2 G 3 and G 3 is given by (111191) as 
before. As for G , Gi, G2, we have the following analogue of Lemma ["10.31 



Lemma 10.6. Let < o < l/6m. Then the Euler products Y\ p E p for I = 0,1,2 are 
absolutely convergent in the domain T>™ . In particular, G$, G\, G 2 can be continued 
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analytically to this domain. Furthermore, we have the estimates 

WGoWcrp^ < O m ( 1q 1Q ^ ) i[(l + O m (p 2mr7 ~ 1 )) forO^r^m (10.12) 

\\G \\c™{v T/(i j ^ exp(O m (log 1/3 R)) (10.13) 
Gi c m (D™„ ) ^ O m (l) 
||G r 2 ||c m (x'^ 6m ) ^ O m)U) (jv)(l) 

Go(0,0) = n(l + O m (p- 1/2 )) (10.14) 

P|A 

G 1 (0,0) = l + o m (l) 

G 2 (0,0) = {W/(j>{W)) m . 

Proof. The estimates for G\ and G2 proceed exactly as in Lemma 110.31 (the additional 
factors of X p (z, z') which appear on both the numerator and denominator of cancel 
to first order, and thus do not present any new difficulties); it is the estimates for G 
which are the most interesting. 

We begin by proving (110. 12j) . Fix I. First observe that G = n p |A-^p ^ Now the 
number of primes dividing A is at most 0(log A/ log log A). Using the crude bound 

A= Yl \hi - hjl <: N m2 <: R° m(1) , (10.15) 

we thus see that the number of factors in the Euler product for G is O m ( lo 1 ° 1 ^ fl ). 
Upon differentiating r times for any ^ r ^ m using the Leibnitz rule, one gets a sum 
of O m ((logR/ loglog-R) r ) terms, each of which consists of O m (\ogR/ log log R) factors, 
each of which is equal to some derivative of 1 + X p (z, z') of order between and r. On 
T>™, each factor is bounded by 1 + O m (p 2mcr ~ 1 ) (in fact, the terms containing a non-zero 
number of derivatives will be much smaller since the constant term 1 is eliminated). 
This gives fTT0~T2]) . 

Now we prove (110.131) . In light of (110. 12p . it suffices to show that 

Y[(l + O m (p 2ma - 1 )) < exp(O m (log 1 / 3 R)). 

p\A 

Taking logarithms and using the hypothesis o < l/6m (and (110.151) ). we reduce to 
showing 

^y^ooog^A). 

p|A 

But there are at most 0(log A/ log log A) primes dividing A, hence the left-hand side 
can be crudely bounded by 

]T rr^OOog^A) 

1 sgn^O (log A / log log A) 

as desired. 

The bound (110.141) now follows from the crude estimate Ep\z, z') — 1 + O m (p~ 1//2 ). □ 
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We now apply Lemma [10.41 with o := 1/Qm and G := GqG\G2- Again by the Leibnitz 
rule we have the bound (110. 10p . and furthermore 

\\G\\cnv?) < o m (i)o mMN) (i) (^^) r n( 1 + -(^ 1/2 ))- 

V & & ' p|A 

for all ^ r ^ m. From Lemma [10.61 and Lemma [10.41 we can then estimate fllQ.5f> as 
/ W \ m 

< (1 + »4D) (^7)J log™ «n t 1 + °->(p _i/2 )) 

+o """« (iSS)n( i +°».^ i/2 ))+ o « 

p|A 

The claim (19.11) then follows by choosing w(N) (and hence W) sufficiently slowly growing 
in N (and hence in R). Proposition 19.61 follows. □ 

Remark. It should be clear that the above argument not only gives an upper bound 
for the left-hand side of (19.11) . but in fact gives an asymptotic, by working out Go(0, 0) 
more carefully; this is worked out in detail (in the W — 1 case) in |17j . 

11. Further remarks 

In this section we discuss some extensions and refinements of our main result. First 
of all, notice that our proof actually shows that that there is some constant j(k) such 
that the number of fc-term progressions of primes, all less than N, is at least (^(k) + 
o{l))N 2 / log fc N. This is because the error term in (13.91) does not actually need to be 
o(l), but merely less than ^c(k,S) + o(l) (for instance). Working backwards through 
the proof, this eventually reveals that the quantity w(N) does not actually need to 
be growing in N, but can instead be a fixed number depending only on k (although 
this number will be very large because our final bounds o(l) decayed to zero extremely 
slowly). Thus W can be made independent of N, and so the loss incurred by the W- 
trick when passing from primes to primes equal to 1 mod W is bounded uniformly in 
N. Nevertheless the bound we obtain on ^y(k) is extremely poor, in part because of the 
growth of constants in the best known bounds c(k,S) on Szemeredi's theorem in [19J, 
but also because we have not attempted to optimise the decay rate of the o(l) factors 
and hence will need to take w(N) to be extremely large. In the other direction, standard 
sieve theory arguments show that the number of fc-term progressions of primes all less 
than iV are at most Ok{N 2 / log fc N), and so the lower bounds are only off by a constant 
depending on k. 



As we remarked earlier, our method also extends to prove Theorem II .21 namely that any 
subset of the primes with positive relative upper density contains a fc-term arithmetic 
progression. The only significant chang^l to the proof is that one must use the pigeon- 
hole principle to replace the residue class n = l(mod W) by a more general residue class 
n = b(modW) for some b coprime to W, since the set A in Theorem 11.21 does not need 



23 Also, since we are only assuming positivity of the upper density and not the lower density, we 
only have good density control for an infinite sequence JVi, A^, . . . — ► oo of integers, which may not be 
prime. However one can easily use Bcrtrands postulate (for instance) to make the Nj prime, giving up 
a factor of 0(1) at most. 
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to obey a Dirichlet-type theorem in these residue classes. However it is easy to verify 
that this does not significantly affect the rest of the argument, and we leave the details 
to the reader. 

Applying Theorem 11.21 to the set of primes p = l(mod4), we obtain the previously 
unknown fact that there are arbitrarily long progressions consisting of numbers which 
are the sum of two squares. For this problem, more satisfactory results were known for 
small k than was the case for the primes. Let S be the set of sums of two squares. It is a 
simple matter to show that there are infinitely many 4-term arithmetic progressions in S. 
Indeed, Heath-Brown [26] observed that the numbers (n— l) 2 + (n— 8) 2 , (n— 7) 2 +(n+4) 2 , 
(n + 7) 2 + (n — 4) 2 and (n + l) 2 + (n + 8) 2 always form such a progression; in fact, he was 
able to prove much more, in particular finding an asymptotic for the number of 4-term 
progressions in S, all of whose members are at most N (weighted by r(n), the number 
of representations of n as the sum of two squares). 

It is reasonably clear that our method will produce long arithmetic progressions for 
many sets of primes for which one can give a lower bound which agrees with some 
upper bound coming from a sieve, up to a multiplicative constant. Invoking Chen's 
famous theorem [7] to the effect that there are ^> N/ log 2 iV primes p ^ N for which 
p + 2 is a prime or a product of two primes, it ought to be a simple matter to adapt 
our arguments to show that there are arbitrarily long arithmetic progressions pi, . . . ,pk 
of primes, such that each pi + 2 is either prime or the product of two primes; indeed 
there should be N/ log 2fc N such progressions with entries less than N. Whilst we do 
not plar@ to write a detailed proof of this fact, we will in [23] give a proof of the case 
k = 3 using harmonic analysis. 

The methods in this paper suggest a more general "transference principle", in that if 
a type of pattern (such as an arithmetic progression) is forced to arise infinitely often 
within sets of positive density, then it should also be forced to arise infinitely often 
inside the prime numbers, or more generally inside any subset of a pseudorandom set 
(such as the "almost primes") of positive relative density. Thus, for instance, one is 
led to conjecture a Bergelson-Leibman type result (cf. [I]) for primes. That is, one 
could hope to show that if Fi : N — > N are polynomials with F(Q) = 0, then there are 
infinitely many configurations (a + Fi(d), . . . , a + Fk(d)) in which all k elements are 
prime. This however seems to require some modification^] to our current argument, 
in large part because of the need to truncate the step parameter d to be at most a 
small power of N. In a similar spirit, the work of Furstenberg and Katznelson [TT] on 
multidimensional analogues of Szemeredi's theorem, combined with this transference 
principle, now suggests that one should be able to show^l that the Gaussian primes in 
Z[z] contain infinitely many constellations of any prescribed shape, and similarly for 



Very briefly, the idea is to replace the function An(Wn + 1) in the definition of the pseudorandom 
measure v with a variant such as Af>(Wn + b)An(Wn + b + 2) for some 1 ^ b < W for which b,b + 2 are 
both coprime to W; one can use Chen's theorem and the pigeonhole principle to locate a b for which 
this majorant will capture a large number of Chen primes. We leave the details to the reader. 

25 Note added in press: such a result has been obtained by the second author and T. Ziegler, to 
appear in Acta Math. 

26 Note added in press: such a result has been obtained by the second author, J. d' Analyse 
Mathematique 99 (2006), 109-176. 
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other number fields. Furthermore, the later work of Furstenberg and Katznelson [TJ] 
on density Hales- Jewett theorems suggests that one could also show that for any finite 
field F, the monic irreducible polynomials in F[t] contain affine subspaces over F of 
arbitrarily high dimension. Again, these results would require non-trivial modifications 
to our argument for a number of reasons, not least of which is the fact that the char- 
acteristic factors for these more advanced generalizations of Szemeredi's theorem are 
much less well understood. 



Appendix A. Proof of Lemma 110.41 

In this appendix we prove Lemma I1U.4I This Lemma was essentially proven in 
but for the sake of self-containedness we provide a complete proof here (following very 
closely the approach in [T7]). 

Throughout this section, R ^ 2, m ^ 1, and a > will be fixed. We shall use 
5 > to denote various small constants, which may vary from line to line (the previous 
interpretation of 5 as the average value of a function / will now be irrelevant). We begin 
by recalling the classical zero-free region for the Riemann ( function. 

Lemma A.l (Zero-free region). Define the classical zero free region Z to be the closed 
region 

Z := {s G C : 10 > Ms > 1 - .J. — -} 
L log(|3s| + 2) J 

for some small < (3 < 1. Then if (3 is sufficiently small, ( is non-zero and meromorphic 
in Z with a simple pole at 1 and no other singularities. Furthermore we have the bounds 

C{s) ~^— = 0(bg(|9to| + 2)); -L- = 0(k>g(|9a| + 2)) 
s-1 C{s) 

for all s G Z. 

Proof. See Titchmarsh [4"Tl Chapter 3]. □ 

Fix (3 in the above lemma; we may take (3 to be small enough that Z is contained 
in the region where 1 — o < 3?(s) < 101. We will allow all our constants in the 0() 
notation to depend on (3 and a, and omit explicit mention of these dependencies from 
our subscripts. 

In addition to the contour Ti defined in f )10.4p . we will need the two further contours 
T and T 2 , defined by 

. , (3 

T (t) := — - — — r + it, —oo < t < oo 

W log(|t|+2) (A.l) 

r 2 (t) := 1 + it, -co < t < co. 

Thus T is the left boundary of Z — 1 (which therefore lies to the left of the origin), 
while Ti and T 2 are vertical lines to the right of the origin. The usefulness of T 2 for 
us lies in the simple observation that £(1 + z + z') has no poles when z G Z — 1 and 
z 1 G T 2 , but we will not otherwise attempt to estimate any integrals on T 2 . 



We observe the following elementary integral estimates. 
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Lemma A.2. Let A, B be fixed constants with A > 1. Then we have the bounds. 

R z dz 



log B (M+2) 



r 



log S (M+2) 



R z dz 



^ A , B (e- SVW *) 
^0 B (\ogR). 



(A.2) 
(A.3) 



Here 5 = 5(A,B,j3) > is a constant independent of R. 



Proof. We first bound the left-hand side of (1A.2D . Substitute in the parametrisation 
(jATTl) . Since F (t) = 0(1) and \z\ > \t\ + p we have, for any T ^ 2, 



Ir 



log S (k|+2) 



R z dz 



^ O 



jR -/3/(log(|t|+2)) 



log^M+2) 
(1*1+ 



dt) 



log^i 



dt) 



^ B {\og B T / R~^ Xo ^dt + / 
Jo </t 

^ A , B (Tlog B rexp(-/?lo gj R/logT) + T 1 - A log B T). 

Choosing T = exp(y^5 log -R/2) one obtains the claimed bound. The bound (1A.3|) is 
much simpler, and can be obtained by noting that R z is bounded on r 1; and substituting 
in (110.41) splitting the integrand up into the ranges \t\ ^ 1/logi? and \t\ > 1/logi?. □ 

The next lemma is closely related to the case m = 1 of Lemma 110.41 

Lemma A.3. Let f(z, z') be analytic in T>\ and suppose that 

\f(z,z')\^exp(O m (W /3 R)) 
uniformly in this domain. Then the integral 



1 



(2vri) 2 



fa,*) 



((1 + z + z') 



R 



z+z' 



C(l + z)C{l + Z>) Z 2 Z 



12 



dzdz' 



obeys the estimate 



/(*, -A 



dz 



C(i + 2)C(i-^ 



+ O r . 



for some 5 = 5(a, [3) > independent of R. 



Proof. We observe from Lemma IA.1I that we have enough decay of the integrand in 
the domain T>\ to interchange the order of integration, and to shift contours in either 
one of the variables z, z' while keeping the other fixed, without any difficulties when 
^s{z), ^s(z') — > oo; the only issue is to keep track of when the contour passes through 
a pole of the integrand. In particular we can shift the z' contour from Ti to T 2 , since 
we do not encounter any of the poles of the integrand while doing so. Let us look at 
the integrand for each fixed z' G T 2 , viewing it as an analytic function of z. We now 
attempt to shift the z contour of integration to r . In so doing the contour passes just 

one pole, a simple one at z = 0. The residue there is ^ f r /(0, z')^-dz', and so we 
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have I — I\ + I2, where 



1 f „R 



z 



1 I f fM <(l±i±i^ d ^. 



To evaluate Ji, we shift the z' contour of integration to r . Again there is just one pole, 



a double one at z' = 0. The residue there is f(0, 0) logi? + -§4(0, 0), and so 



h = f(0,0)lo g R+^(0,0) + ±Jj(0,z')^dz' 

= /( ,0)lo gj R+|^(0,0) + O m (e- 5 ^), 

for some 5 > 0, the latter step being a consequence of our bound on / and flA.2j) (in 
the case B = 0). 

To estimate I2, we first swap the order of integration and, for each fixed z, view the 
integrand as an analytic function of z' . We move the z' contour from T 2 to r , this 
again being allowed since we have sufficient decay in vertical strips as \5sz'\ — > 00. In so 
doing we pass exactly two simple poles, at z 1 = — z and z 1 = 0. The residue at the first 
is exactly 



which is one of the terms appearing in our formula for /. 
The residue at z' = is 

R z 

f(z,0)—dz, 
r z 



which is 0(e 5v/logi? ) for some 5 > by flA.2p . The value of I2 is the sum of these two 
quantities and the integral over the new contour To, which is 

qi + z + z')R^' , 

f{z,z) — dzdz. A.4 

C(l + z)Q{± + z')z l z u 

In this integrand we have |/| = exp(O m (log 1//3 R)) and, by Lemma lA. 11 1/|C(1 + z)\ *C 
log(|9fa?|+2) and l/|C(l + z')l < logflSte'l +2). Assume that /3 < 1/10, as we obviously 
may. We claim that 

|C(1 + z + z')\<^(l + \z\ + \z'\) 1 ^ < (1 + \z\f\l + \z'\) 1 ^ (A.5) 

for all z,z' G IV Once this is proven it follows from ( 1A.2I) . applied with A = 7/4 and 
A = 2, that the integral OA .40 is bounded by O m (e" <5v/I °^ R ) for some 5 > 0. Now if 
1/2 < <r < 1 and |t| ^ 1/100 we have the convexity bound \((a + it)\ < e |t| 1_CT+e (cf. 
[4T1 Chapter V]), and so ( 1A.5I) is indeed true provided that ^(z + z')! ^ 1/100. However 
since z,z' G r one may see that if l^- 2 ')! ^ t then \z + z'\ ^> l/log(t + 2). It 

follows from Lemma [A. II that (1A.5jl holds when \Q(z + z')\ ^ 1/100 as well. 



Thus we now have estimates for Ii and I2 up to errors of O m (e <5 v / i°I~R)_ Putting all of 
this together completes the proof of the lemma. □ 
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Proof of Lemma \ 10.4\ Let G = G(z, z') be an analytic function of 2m complex variables 
on the domain D™ obeying the derivative bounds (110.101) . We will allow all our implicit 
constants in the OQ notation to depend on m, (3, a. We are interested in the integral 

i(a m) ; (Sip j Tl ■ ■ ■ X, G{z ' 2 > n ai+mi+ Q ^ ^ 

and wish to prove the estimate 

m 

J(G,m) := G(0, . . . ,0)(lo gj R) m + £ 0(||G|| CJ - ra (log J2)^) + 0( e - 5 ^). 

The proof is by induction on m. The case m = 1 is a swift deduction from Lemma [A. 31 
the only issue being an estimation of the term 

2^ /o " Zl) C(i + ^i)C(i-^f 

It is not hard to check (using Lemma [A. ip that 



r 



C(i + zi)C(i-*i)*i 



0(1), (A.6) 



and so this term is 0(sup zg23 i |G(;z)|) = 0(||(j||cfi(x>i)). 

Suppose then that we have established the result for m ^ 1 and wish to deduce it for 
m + 1. Applying Lemma lA. 31 in the variables z m +i, z' m+1 , we get I(G, m + 1) = 

(27ri)*» y ri • • • 4 ■ • • ' U ' Zl ' ■ ■ ■ ' *™' UJ 11 C(l + Zj)C{l + z$ z)z'f aZ](1Z i 

, 1 f [ „. , M fr C(l + % + 4) ^ , 

+ (2**)*" J Vi ■ ■ ■ J ri U • • • > Wi, • • • , z m) y + + ^ ^ az 3 az 3 



0{e 



=I{G{ Zl , . . . ,z m , 0, z[, . . . , z' m , 0), m) log R + I(H, m) + 0( e - 5VW ) 
where 5 > and H : X>™ — > C is the function 

<9Gr 

H\Z\) • • • , Z ra , 2^, . . . , 2 m ) • „ / ( z ly • • • j ^m? 0, Z^, . . . , z m , 0) 

C,Z m+l 



+ / ' ' ' ' Zm ' Zm +!' Z lT--i Z mi z m+lj r | ; 



The error term 0(e 5 ^ log R ) which we claim here arises by applying (110. 10p and several 
applications of ( 1A.3h . 

Now both of the functions G(zi, . . . , z m , 0, ^, . . . , z' m , 0) and H(zi, . . . , z m , zj, . . . , z' m ) 
are analytic on £>™ and (appealing to ( 1A.6H ) we have ||i?||e?j(£>m) = O m {\\G\\ C j +li prn+i^) 
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for ^ j ^ m. Using the inductive hypothesis, we therefore obtain I{G,m + 1) = 

m 

G(0, . . . , 0)(lo gj R) m+1 + J20 m (\\G(-, 0, 0)|| CJ( ^)(lo gj R) m+1 ^) 

3=1 

m 

+H(0, 0)(log R) m + ^ O m (||tf|b ra (log R) m ~i) + 0(e- 5 ^) 
= G(0, . . . , 0)(lo gj R) m+1 + ^O m (||G|| c , (Pr+1) (lo gj R) m+1 ^) 

m, 

+#(0, . . . , 0)(log + ]T O m (||G|| OJ+1(cr fi ) (logi2) m -0 + 0(e- 5 ^) 

3=1 
ra+1 

= G(0, . . . , 0)(lo gj R) m+1 + £ O m (||G|| c , ( ^ +1) (logit:)- +1 ^) + 0(e- s ^), 

3=1 

which is what we wanted to prove. □ 
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